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While the thermodynamics of DNA hybridization is weh understood, much less is known about the kinetics of 
this classic system. Filling this gap in our understanding has new urgency because DNA nanotechnology often 
depends critically on binding rates. Here we use a coarse-grained model to explore the hybridization kinetics 
of DNA oligomers, finding that strand association proceeds through a complex set of intermediate states. 
Successful binding events start with the formation of a few metastable base-pairing interactions, followed by 
zippering of the remaining bonds. However, despite reasonably strong interstrand interactions, initial contacts 
frequently fail to lead to zippering because the typical configurations in which they form differ from typical 
states of similar enthalpy in the double-stranded equilibrium ensemble. Therefore, if the association process is 
analyzed on the base-pair (secondary structure) level, it shows non-Markovian behavior. Initial contacts must 
be stabilized by two or three base pairs before full zippering is likely, resulting in negative effective activation 
enthalpies. Non-Arrhenius behavior is observed as the number of base pairs in the effective transition state 
increases with temperature. In addition, we find that alternative pathways involving misbonds can increase 
association rates. For repetitive sequences, misaligned duplexes frequently rearrange to form fully paired 
duplexes by two distinct processes which we label 'pseudoknot' and 'inchworm' internal displacement. We 
show how the above processes can explain why experimentally observed association rates of GC-rich oligomers 
are higher than rates of AT-rich equivalents. More generally, we argue that association rates can be modulated 
by sequence choice. 



DNA is central to biology, and has become a key 
ingredient in nanotechnology. Single strands of DNA 
have a sugar-phosphate backbone with bases (adenosine, 
thymine, guanine or cytosine - hereafter referred to as A, 
T, G and C) attached at regular intervals. Watson and 
CriclP showed that hydrogen bonding between A-T and 
G-C base pairs and stacking interactions between adja- 
cent bases result in helical duplexes when two sequences 
are complementary. This rule has been used to design 
structures,^ machines^ and computational circuits^ that 
operate in parallel on a nanometer length scale. In many 
of these systems, assembly or operational dynamics is 
primarily driven by the association, or hybridization, of 
short strands of DNA (oligomers) to form duplexes of or- 
der ten base pairs. Understanding the details of oligomer 
association kinetics, and knowledge of how to accelerate 
or suppress reaction rates, is therefore essential if DNA 
nanotechnology is to fulfil its promise. 

The thermodynamics of DNA duplex formation is well 
understood, and is dominated by states involving either 
strongly bound duplexes, or widely separated strands. 
Therefore, it is well characterized^ ^ and can be described 
by all-or-nothing (two- state) models.^ By contrast, hy- 
bridization kinetics depends on the rarely-visited inter- 
mediate states that lie between these two limits, and is 
therefore much harder to understand. In principle, the 
ensemble of transition pathways may be complex, lead- 
ing to rich and subtle behaviour. Bimolecular association 
rate constants of 10^ — 10^ M~^s~^ have been measured 
at approximately room temperature and at high salt con- 
centrations ([Na+] - 1 M or [Mg^] - 0.01 M) for DNACIEI 



and RNA.IISHini There is agreement that dissociation rates 
increase exponentially with temperature^ lo 12^14^ but au- 
thors h ave re ported association rates that increase ,1^^ 
decreasd^S'H' and behave non-monotonicall}^^ with tem- 
perature. To our knowledge, there have been no sys- 
tematic studies of the consequences of DNA sequence for 
hybridization rates. 

Theoretical models at the level of secondary structure 
(the degree of base pairing) have been proposed to ex- 
plain experi ments w here association rates decrease with 
temper at urc^IS^IIEi These models posit that strands ini- 
tially held together by short duplex sections tend to dis- 
sociate rather than full y hyb ridizing, either because they 
melt extremely quickl}i^^^ or due to a thermodynamic 
barrier to full hybridization.^^ However, hairpins with 
stems as short as three base pairs are thermodynami- 
cally stable,^ and no detailed description of a barrier to 
completing hybridization has been proposed. 

Computer modelling can shed light on the details of 
hybridization kinetics. Ideally, simulations would use 
atomistic potentials such as AMBER^^ for maximal reso- 
lution, but the long time scales involved prevent exhaus- 
tive studies of such systems. To explore reaction path- 
ways, coarse-grained models are needed. These must be 
efficient enough to access the critical time-scales, but de- 
tailed enough to represent key features of the 3D struc- 
ture, the mechanical properties and thermodynamics of 
both single- and double-stranded DNA. A number of 
models have been proposed, but most are either 'lad- 
der' models that do not capture structural and mechan- 
ical properties of DNA^^^^ or have not been carefully 
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parameterized to DNA t her mo dynamicsP^l'^ Hybridiza- 
tion kineti cs hav e been studied in a detailed model known 
as SSPN.lP^I^The authors identified initial binding and 
then 'slithering' of strands past each other as a mecha- 
nism of duplex formation. However, single strands in 
3SPN.1 are overly stiff and have structural and mechan- 
ical properties that are very similar to the duplex state, 
so association necessarily involves two pre-formed helices 
coming into contact. It is unclear whether the same path- 
way will be observed for a model with a more realistic 
description of the single- stranded state. 



Here we apply a recently developed coarse-grained 
model, 'oxDNA',[23Ell to the association of DNA 
oligomers. The model incorporates the average struc- 
tural, mechanical and thermodynamic properties of both 
single- and double-stranded DNA. Duplexes are stiff he- 
lices, but single strands can unstack, allowing them to 
adopt non-helical structures, reproducing their relative 
flexibility. Such flexibility allows oxDNA to capture prop- 
erties such as the formation of hairpins^^ and the force- 
extension properties of single strands. We expect the 
flexibility of single strands to be a critical factor in hy- 
bridization. The robustness of oxDNA has been estab- 
lished by studying a range of phenomena that were not 
used for the initial parameterization. The formation of 
metastable kissing hairpin^^, cruciform structures under 
torsion^^ and liquid crystals at high densit}/^^ are all re- 
produced in a physically reasonable way. Three dynamic 
DNA-based nanodevices have been simulated .^^^^^ The 
calculations reproduced the designed behaviour, but also 
identified key subtleties arising from an interplay of struc- 
tural, mechanical and thermodynamic factors. OxDNA 
undergoes an overstretching transition, with the critical 
force in good agreement with experiment.^^ Most impor- 
tantly, oxDNA quantitatively reproduced the 10^"^-fold 
acceleration of the toehold-mediated strand displacement 
rate with increasing toehold length found by Zhang and 
Winfree^^. As binding to the toehold involves the same 
self-assembly processes as in hybridization, our success 
gives us confidence to use oxDNA to study oligomer as- 
sociation in detail. 



We proceed as follows. After briefly presenting the 
model and simulation techniques, we study hybridiza- 
tion processes for sequences designed to limit misbond- 
ing. Hybridization involves a zipper-like mechanism, and 
reaction rates are suppressed at increased temperature 
due to the instability of initial contacts. We then con- 
sider repetitive sequences, finding that alternative path- 
ways to duplex formation, which we name 'inchworm' 
and 'pseudoknot' internal displacement, can significantly 
accelerate association. Finally, we demonstrate that the 
instability of initial contacts and alternative pathways to 
assembly can lead to sequence-dependent hybridization 
rates, in agreement with some recent experiments.^ 



I. MODEL AND METHODS 

A. A coarse-grained model 

OxDNA is detailed in Refs.|27l |2pand^, and in Ap- 
pendix [Aj The version used for the majority of this work 
is given in Ref. [28} and code implementing it is avail- 
able for download.l^ A model strand is a chain of rigid 
bodies, each one representing a nucleotide. Nucleotides 
have one interaction site for the backbone and two for 
the base. The potential energy of the system includes 
terms for backbone connectivity, base-pairing, stacking 
and excluded volume interactions. 

Base-pairing interactions are only included between 
complementary pairs A-T and G-C to reproduce Watson- 
Crick specificity. For much of this work, we use a parame- 
terization with no further sequence dependence^^ to high- 
light generic properties that can be obscured by sequence- 
dependent effects. For sequence-dependent thermody- 
namics, we use a parameterization in which hydrogen- 
bonding and nearest-neighbour stacking strengths de- 
pend on the identity of the bases.^ Both parame- 
terizations were fitted to oligonucleotide melting tem- 
peratures predicted by SantaLucia's nearest-neighbour 
model, ^ as well as the structural and mechanical prop- 
erties of double- and single-stranded DNA. The model 
was fitted to experiments performed at [Na^] = 0.5 M, a 
high salt concentration at which the strength of screen- 
ing justifies incorporating electrostatic repulsion into a 
short-ranged excluded volume. 

B. Simulation techniques 

The majority of simulations in this work were per- 
formed using a Langevin Dynamics (LD) algorithnP^. 
Langevin approaches represent an implicit solvent by 
augmenting the Newtonian equations of motion with 
drag and random noise forces. Simulated particles then 
undergo diffusive motion, and the whole system samples 
the canonical ensemble if the relative sizes of the drag and 
noise forces are chosen appropriately.^^ Details of our im- 
plementation are given in Appendix |B 1| As is common 
in simulations of coarse-grained models, we use a higher 
diffusion coefficient than for physical DNA. The freedom 
to accelerate diffusion is an advantage of coarse-grained 
models, which also tend to exaggerate the speed with 
which processes occur by smoothing energy landscapes 
on a microscopic scale. As a result, they can be used 
to study even more complex systems than would other- 
wise be expected. The price to pay is that only relative 
rates are physically meaningful, and we will focus on such 
relative rates in this work. 

To check that the results reported here are not 
overly sensitive to the details of the simulation method 
and choice of friction constants, hybridization of non- 
repetitive duplexes was also simulated with diffusion co- 
efficients reduced by a factor of 10. The results (Table 
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III , discussed in Appendix B 1 ) are qualitatively simi- 



lar, except of course for an overall drop in the reaction 
rate with slower diffusion. For the sequence-dependent 
results at the end of this work, an alternative B rown ian 
thermostat^^ was used as detailed in Appendix B^ As 
we will show, the Brxownian algorithm (with a larger 
diffusion coefficient than the LD approach) produces be- 
haviour consistent with the predictions of the LD ther- 
mostat, further evidence that the qualitative results of 
this work are not sensitive to the simulation method. 

To obtain good statistics for reaction transitions, which 
are dominated by ra re events, we used Forward Flux 
Sampling (FFS).'^^^!! This technique facilitates sampling 
of a complex transition path ensemble by splitting a rare 
event into several stages that are easier to measure. De- 
tails of the application of FFS in this work are provided in 
Appendix |B 1 a Finally, simulations performed to mea- 
sure equilibrium averages, rather than dynamics, were 
performed with an efficient cluster-move Monte Carlo 
algorithm,E3 with the addition of umbrella sampling.!^ 
Details are given in Appendix |B 3[ 



II. RESULTS AND DISCUSSION 

A. Hybridization of non-repetitive sequences 

We first consider the hybridization of a 14-base duplex 
deliberately designed to limit non-intended base-pairing. 
The sequences of the two strands are: 

• 5'- TAT CTG GCT TGT CO - 3^ 

• 5'- CGA CAA GCC AGA TA - 3^ 

Simulations were run at a range of temperatures, from 
300 K to 340.9 K, the latter being approximately the 
melting temperature of the strands at the concentration 
used. Details of the simulations are provided in Ap- 
Additional simulations at 300 K were per- 
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formed in which only native (those expected in the full 
14-base pair duplex) base pairs were assigned a non-zero 
hydrogen-bonding energy, to determine the effect of non- 
native base pairs. 

Qualitatively, binding events start with the formation 
of a base pair between complementary bases after the 
strands have diffused into contact. In successful bind- 
ing events, more base pairs subsequently form before this 
base pair breaks, a process postulated elsewher e! -^^ I -^-^ * ^^ * ^^ 
and known as 'zippering'. A typical pathway is illus- 
trated in Fig. [l](a)-(e). In some cases, initial base pairs 
are non-native, with native base pairs replacing them 
later. From Fig. [ij it is clear that zippering involves 
relatively unstructured single strands coming together to 
form base pairs in stages. Initial contacts can form be- 
tween any bases but have a bias towards those at the end 
of the strands, although initial contacts in the center are 
more likely to proceed to the full duplex once formed (see 
Fig. ^. 



Fig. [T](f) shows that the binding rate decreases with 
increasing temperature. In the commonly used Arrhenius 
model of reaction kinetics, the association rate /Con de- 
pends on the temperature T as /Con = koex.-p{—Hg,/RT), 
where Hg, is a constant activation enthalpy, and ko is a 
constant rate. A system with a single well-defined transi- 
tion state would indeed follow this prediction. A decrease 
in kon with T suggests a transition state with a negative 
enthalpy with respect to the unbound state. Overall, 
however, the Arrhenius model is a poor fit to our re- 
sults (see Fig. [7|(a)). The apparent activation enthalpy, 
which can be inferred from the slope — ^(i/t) ^^^on, be- 
comes more negative with temperature, with our model 
showing an apparent Hg,{T) ranging from around ~ 
— 4.5 kcalmol"^ to approximately — 12.5 kcalmol"^ as T 
rises from 300 K to 340.9 K. These values are similar to 
those measured for short RNA oligomers, which range 



from — 5kcalmor 



to —9 to —18 kcalmol" 



In contrast to association, we expect a relatively 
large positive activation enthalpy of dissociation because 
breaking a fully formed duplex involves disrupting many 
enthalpically favoured bonds (see e.g. Fig. [s] ). This 
explains why experimental measurements find dissocia- 
tion rates /Coff that increase exponentially with increas- 
ing temperature.^ ^^^^^^ Indeed, from calculations of the 
equilibrium constant Keq = ^on/^oflP^, we find that /Coff 
changes by about ~ 10^ over the range 300-340.9 K. Here 
we focus on the more subtle behaviour of the association 
rate, which is especially relevant for non-equilibrium pro- 
cesses in DNA nanotechnology. 

FFS allows us to sample from the ensemble of transi- 
tion pathways. At 300K, states involving two relatively 
well-formed base pairs have a 33% probability of reaching 
the full duplex, while at 340. 9K, this success rate drops to 
just 8% (Table XIII). Even for systems with only native 
base-pairing, states with two relatively well-formed base 
pairs still only progress to the full duplex in 65% of cases 
at 300 K. The fact that states with some base-pairing 
can fail to form a duplex explains the negative activation 
enthalpy: our effective 'transition state' is enthalpically 
stabilized by base-pairing. Further, the typical number 
of base pairs in this 'transition state' increases with tem- 
perature, as more base-pairing is required to make duplex 
formation probable. The reasons for the temperature de- 
pendence include: 1) the state with two base pairs itself 
becomes less stable, and 2) new bonds are less likely to 
form because a) strands become more unstructured and 
b) forming new base-pairs generates a smaller free-energy 
gain. As a result, the activation enthalpy becomes more 
negative with temperature and there is no single tran- 
sition state with well-defined properties, explaining the 
non- Arrhenius behaviour. 

Simple thermodynamic considerations at the level of 
secondary structure do not explain why many initial con- 
tacts fail to completely hybridize. For example, free- 
energy profiles of the duplex states (see e.g. Fig. |8| sug- 
gest that adding a single base pair reduces the free-energy 
of the system by 0.6 kcalmol"^ or l.Oi^T at 340.9 K, and 
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FIG. 1. (a)-(e) Stages of DNA hybridization, taken from a typical single trajectory. Green spheres indicate the 5^ end of 
each strand. Schematic diagrams underneath indicate the base pairs present in the system, (f) Hybridization rates kon{T) of 
non- repetitive duplexes as a function of temperature T, relative to A:on(300), shown as red crosses connected by a solid line. 
Also shown (blue square) is the result for duplex formation at 300 K when misaligned bonds are forbidden. 



l.Tkcalmor^ or 2.9RT at 300 K. This argument sug- 
gests that the process should be favourable once the first 
base-pair has formed. To understand why this reasoning 
fails we compare configurations obtained from hybridiza- 
tion simulations to configurations with the same degree of 
base-pairing taken from equilibrium duplex simulations. 
In Fig. |2](a) and (b) we show two configurations with 
the same base-pairing and overall interstrand enthalpy, 
with panel (a) obtained from a simulation of association 
initiated in the unbound state and panel (b) obtained 
from equilibrium simulations of the bound state (as de- 
tailed in Appendix C5). Clearly the latter has a much 
more favourable spatial conformation for full duplex for- 
mation, since less rearrangement is required. A thorough 
analysis (Appendix D 2 ) confirms that states with a cer- 
tain number of base pairs found in assembly simulations 
are on average different from those in equilibrium simula- 
tions, and clearly less conducive to full duplex formation. 
For example, the bases are further away from their native 
partners (see Table XIV). 

The above argument requires that the breaking of the 
initial contacts can occur faster than strands equilibrate 
in the configuration space available given the existence 
of those contacts. This is plausible because the single 
strands are disordered. Thus the system appears non- 
Markovian when analysed only in terms of secondary 
structure: a given state has memory of whether it is 
accessed during assembly transitions or accessed from 
the bound ensemble. We stress that to observe such 
non-equilibrium effects, it is crucial to treat the three- 
dimensional structure of single and double strands prop- 
erly. 



B. Hybridization of repetitive sequences 

Having studied strands in which non-native interac- 
tions are minimal, we now consider the limit of repeti- 
tive sequences that can form many misaligned structures 
(but no intrastrand hairpins). The sequences of the two 
strands are: 



n- 
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FIG. 2. Two configurations with two base pairs, and an in- 
terstrand enthalpy of ^ — 7.15 kcalmoP^. The configuration 
in (a) is taken from a simulation of association, and that in 
(b) from an equilibrium simulation of the system. 



• b'- ACA GAG AGA GAG AG - 3^ 

• 5'- GTG TGT GTG TGT GT - 3^ 

At 300 K we find a number of metastable structures in 
addition to the fully-bound duplex. These structures in- 
volve misaligned duplexes, which we label by their 'reg- 
ister'. A register of r corresponds to bases pairing with 
a partner offset by r bases in the 5' direction from their 
native partner. We find two basic classes, as outlined 
below. 

• Purely misaligned structures with the maximum 
number of base pairs given their register. A config- 
uration with the maximal bonding for register —8 
is illustrated in Fig. |3](a). 

• 'Pseudoknot' structures,^ characterised by two reg- 
isters ri and r2. If we label nucleotides by their po- 
sition on the strand (in the 5' — 3' direction) then in 
a pseudoknot the index of bases involved in pairing 
on one strand is a non-monotonic function of the 
index of their partner on the other strand. A typ- 
ical metastable structure involving registers 6 and 
—6 is illustrated in Fig. [3](b). 
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These metastable structures can be relatively slow to 
relax into either the fully-formed duplex or dissociated 
single strands. FFS is not efficient when intermediates 
with long lifetimes are present. We therefore initially 
measured the rates at which strands formed a misaligned 
structure of at least four base pairs, or a number of the 
more stable pseudoknot states, at 300 K. Further FFS 
simulations were performed to establish the eventual fate 
of a number of metastable states. Details are provided 



in Appendices C2, C3 and C 4 



Strands can initially associate through zippering in a 
range of registers (Table |XV|). The rate of formation of a 
given register is approximately proportional to the num- 
ber of base pairs in that register, as a result of the num- 
ber of possible initial contacts. From this point, systems 
with incomplete base pairing tend to rearrange into reg- 
isters with a greater degree of base pairing. We describe 
these rearrangement processes as 'internal displacement', 
as they involve the formation of a secondary double helix 
of an alternative register that competes for base-pairing 
with the ffist. This is analogous to the well known 'strand 
displacement' process in which an invading strand re- 
places another within a duplex, except that in this case 
only two strands are involved (in a sense, a strand dis- 
places itself). Two dominant rearrangement processes 
are observed. 

• 'Inchworm' displacement (Fig. |3](c)): thermal fluc- 
tuations allow base pairs from an alternative regis- 
ter to form. The result is a 'bulge' loop .1^ Generally, 
this (unfavorable) bulge is resolved by breaking the 
newly- formed base pairs in the alternative register. 
Occasionally, however, further base pairs are bro- 
ken in the original register and additional base pairs 
in the new register form. The bulge can thus be 
passed through the original duplex in an inchworm 
fashion, allowing the new register to displace the 
old. 

• 'Pseudoknot' displacement (Fig. [3](d)): short 
misaligned duplexes have two long single-stranded 
tails. These tails can bind, resulting in a pseudo- 
knot. The new register can compete for base pairs 
with the old, potentially displacing it. In some 
cases, such as the pseudoknot —6,6 in Fig. Kb), 
neither arm of the pseudoknot can fully displace 
the other and some degree of spontaneous melting 
is necessary. One of the arms in the pseudoknot 
can also be displaced (in an inchworm fashion) by 
an alternative register. 

Accurately measuring the transition rates between all 
registers is impractical due to the enormous number of 
possibilities and the large range of transition rates. Over- 
all, however, initial alignments with more than 4 base 
pairs tend to undergo internal displacement to more 
strongly bound states, eventually reaching the full du- 
plex. Misaligned duplexes of 4 base pairs frequently de- 
tach or undergo rearrangement. Internal displacement by 



a register with fewer base pairs than the original regis- 
ter is suppressed by the free-energy cost of breaking base 
pairs, although it is occasionally observed. 

At the low concentrations typical of experiment, the 
time spent in metastable intermediates is negligible com- 
pared to the typical time between attachment events. 
Metastable states then provide alternative pathways for 
the second-order process of association, increasing the 
rate constant for binding: in our case, by a factor of five 
for the repetitive sequences (see Appendix D3). Repet- 
itive sequences were also studied at 340.9 K. At these 
temperatures, short duplexes melt quickly and hence the 
probability that metastable structures are able to rear- 
range is reduced. Consequently the rate of formation of 
the fully-formed duplex falls off slightly faster with tem- 
perature than for the non-repetitive sequence: by a factor 
~ 9 over the temperature range 300-340.9 K rather than 
~ 5. Internal displacement therefore provides another 
possible contribution to negative activation enthalpies in 
DNA duplex formation. 



C. Sequence-dependence of binding rates 

We have established two key facts: that initial con- 
tacts frequently dissociate before forming a full duplex, 
and that misaligned bonding can accelerate duplex for- 
mation through internal displacement. Both findings 
suggest possible mechanisms for sequence-dependence of 
DNA binding rates. 

• In DNA, G-C base pairs are more stable than A-T. 
Hence, initial contacts between GC-rich sequences 
should be more stable, and more likely to zip up fol- 
lowing initial contact. If initial contacts form at ap- 
proximately the same rate, duplexes with a greater 
density of G-C base pairs should form faster. 

• The number, stability with respect to dissociation 
and ease of internal displacement of misaligned 
metastable states will vary greatly from sequence 
to sequence. Increasing any of these factors should 
result in faster duplex formation. 

Systematic experimental investigations of the sequence 
dependence of DNA binding rates are not available. 
Zhang and Winfree,*^ however, have probed sequence- 
dependent binding rates indirectly through toehold- 
mediated strand displacement. In the limit of long toe- 
holds, the authors identified the displacement rate with 
the binding rate of toehold sequences. Their data showed 
a significant difference between the binding rates of GC- 
rich and AT-rich toeholds, with GC-rich toeholds causing 
faster displacement. To explore whether these differences 
can be attributed to the factors outlined here, we have 
simulated the association of 8-base duplexes using the se- 
quences from Ref. 9 , as given in Table XIX These sim- 



ulations were performed using the Brownian thermostat 
with the sequence-dependent version of the model (more 
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FIG. 3. (a) Misaligned bonding in register —8. (b) Pseudoknot bonding, in registers 6 and —6. Examples of internal displace- 
ment through (c) the inchworm mechanism and (d) the pseudo knot mechanism, (c.i) The system is initially bound through 
10 base pairs in register 4. (c.ii) Due to thermal fluctuations, three base pairs from register form at the expense of three base 
pairs from register 4, resulting in a 'bulge', (c.iii) More base pairs in register form at the expense of register 4. The bulge is 
passed down the helix, (civ) Eventually, the final base pair in register 4 breaks and register is able to form all its possible 
base pairs, (d.i) A strand initially in register 10. (d.ii) Additional base pairs (from the correctly aligned register 0) form, (d.iii) 
Further base pairs form in register at the expense of register 10. (d.iv) Eventually, the final base pair in register 10 breaks 
and a fully-bonded, correctly aligned duplex can form. 



details are provided in Appendices B2 and C6). The 



association rates (Table XIX) are in reasonable agree- 
ment with Ref. |9l The GC-rich sequence forms duplexes 
fastest, followed by the average- strength sequence and 
finally the AT-rich sequence. The GC-rich sequence is 
faster than the AT-rich analogue by a factor of 7.4: Zhang 
and Winfree found a factor of 15 in the long- toehold limit. 
Interestingly, part of the factor of 7.4 in our model can 
be attributed to correctly-aligned G-C base pairs being 
more stable than A-T equivalents, and part to an in- 
creased probability of binding via internal displacement. 
This conclusion follows from simulations in which inter- 
nal displacement was suppressed by only allowing native 
base pairs: in this case, the GC-rich sequence was only 
3.2 times as fast as the AT-rich variant. 



III. CONCLUSIONS 

We have studied DNA hybridization using a coarse- 
grained model, oxDNA, that was carefully optimised to 
represent both single and double-stranded DNA. Stiff, 
helical duplexes form in a realistic fashion from flexible 
single strands. By capturing these generic features of 
DNA we predict a complex ensemble of transition path- 
ways for association, without a single transition state 
with well-defined properties, and qualitatively distinct 
dynamics for different sequences. The association of a 
duplex occurs through the formation of initial contacts 
involving a small number of bases, followed by zippering 
of the remainder, as suggested previously. We go 



beyond this classic picture to show that initial contacts 
often dissociate, despite non-negligible attractive interac- 
tions, because their configurations are not conducive to 
full duplex formation and strands can detach before they 
equilibrate within the space of configurations defined by 
the secondary structure of the initial contacts. Thus hy- 
bridization can fail even for interaction enthalpies which, 
if accessed from the equilibrium duplex ensemble, would 
overwhelmingly lead to duplex reformation. 

Increasing the temperature destabilizes initial con- 
tacts, and lowers the drive to form more base pairs. The 
overall rate of association therefore decreases with tem- 
perature, resulting in a negative activation enthalpy if 
the results are interpreted through an Arrhenius model. 
At variance with the Arrhenius model, however, the ef- 
fective activation enthalpy becomes more negative with 
increasing temperature, consistent with the fact that the 
strength of the initial contacts that are necessary to en- 
sure duplex formation increases with temperature. Thus 
the system does not possess a single, well-defined 'tran- 
sition state' but a complex ensemble of transition path- 
ways. This ensemble of pathways is further complicated 
by non-native interactions, which mean that systems can 
first form misaligned duplexes, and subsequently undergo 
internal displacement (rearranging without detaching) 
via inchworm or pseudoknot mechanisms to reach the 
fully formed duplex. As shown by the study of 8-base du- 
plexes, sequences need not be perfectly repetitive for this 
pathway to be relevant. At low reactant concentrations, 
these alternative pathways accelerate association. Due to 
the principle of detailed balance, dissociation must also 
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occur via internal displacement pathways, as well as via 
direct melting. We note that for longer strands, the prob- 
ability of binding in a misaligned fashion is higher. We 
would therefore expect these mechanisms to contribute 
strongly to the association of longer strands. 

If initial contacts frequently fail, stronger contacts 
should prove more likely to succeed and thereby acceler- 
ate reaction rates. The rate of duplex formation through 
internal displacement will also depend strongly on the 
sequence. We tested the impact of these two proposed 
causes of sequence-dependent reaction rates and demon- 
strated their effect for short duplexes, finding agreement 
with experimental data. We predict that systematic 
studies of hybridization rates for sequences of varying 
GC content, and for sequences of equal overall binding 
strength but different degrees of repetitiveness will find 
that increasing GC content and repetitiveness accelerates 
duplex formation. We note that to resolve the effects 
clearly, care may have to be taken to avoid competing 
single- stranded hairpins. 

It is worth contrasting our results with those found for 
3SPN.1, an import ant mod el that has also been used to 
study hybridization.lSSmiilMolxh ese authors also observe 
complex kinetics, but with significant differences. In par- 
ticular, for non-repetitive sequences, the authors claim 
that duplexes typically form non-native contacts, before 
'snapping' into the duplex state.*^ Other studies with the 
same model have found that the strands 'wind' to form 
a double helix, then 'slide' past each other to reach ap- 
propriate alignment. Repetitive sequences form mis- 
aligned structures, which relax int o the fully-formed du- 
plex by 'slithering' past each other.^^^I^ By contrast, the 
basic mechanism of duplex formation in oxDNA follows 
a clear nucleation and zippering pathway. Zippering oc- 
curs as bases from the relatively disordered single strands 
successively stack onto the growing duplex. Slithering, 
like internal displacement, allows misaligned duplexes to 
relax to the fully base-paired structure. The mecha- 
nisms, however, are quite distinct: internal displacement 
involves the formation of two separate, base-paired du- 
plex regions that compete for bases, whereas slithering 
involves the sliding of strands past each other in a process 
'devoid of significant energy barrier s'.l^ Both inchworm 
and pseudoknot mechanisms rely on the fiexibility of the 
single strands and hence will be suppressed in 3SPN.1. 
In oxDNA, slithering is suppressed as it would require 
the system to pass through a double-helical state with 
no base-pairing. A more extensive explanation of the 
different mechanisms observed for 3SPN.1 and oxDNA is 
given in Appendix [E) 

OxDNA is a simplified model, and it is therefore ap- 
propriate to evaluate the robustness of our conclusions. 
Firstly, the zipper-mechanism by which duplexes form 
relies on the existence of attractive interactions between 
bases, and the fact that the transition involves fiexible 
single strands forming a stiff, helical duplex. These are 
generic features of DNA that are well reproduced by 
oxDNA, and hence the conclusion is likely to be reliable. 



Secondly, the internal displacement mechanisms iden- 
tified involve kinked and pseudoknotted intermediates 
that are well-e stablished motifs in nucleic acid secondary 
structure.^ Moreover, oxDNA describes the kinetics of 
conventional strand displacement involving three strands 
well.^^ It is therefore likely that these pathways exist for 
real DNA. Thirdly, the frequent failure of initial con- 
tacts to form duplexes, despite the expected thermody- 
namic stability of extra base pairs, also relies on the well- 
established differences between single strands and du- 
plexes. Nevertheless, it should be kept in mind that other 
contributions to the overall activation enthalpy may need 
to be taken into account. For example, we have not at- 
tempted to model the decrease of the viscosity of water 
with temperature, which may accelerate duplex forma- 
tion at higher temperatures. Microscopic barriers such 
as the disruption of solvating water molecules prior to 
hydrogen-bond formation have also not been explicitly 
treated. In Appendix |D 2| we show that initial contact be- 
tween strands involves states with less intrastrand stack- 
ing on average than in the unbound ensemble. Breaking 
stacking helps the strands to be in contact without being 
fully bound. This tendency contributes positively to the 
activation enthalpy - in our model, this effect is smaller 
than the competing negative contributions, but the rela- 
tive size may be different in nature. 



Experimental studies do not currently provide a consis- 
tent picture of hybridization kinetics. In particular, it is 
not clear whether the rate of duplex formation typically 
increases or decreases with temperature (corresponding 
to positive or negative activation enthalpies respectively) . 
We have, however, provided a physically reasonable jus- 
tification for negative activation enthalpies; that initial 
contacts are surprisingly likely to detach because the 
overall configuration of the two strands is not conducive 
to full duplex formation. This argument explains why, 
in Markov models constructed at the base-pair level, one 
needs to either postulate a "barrier" to full duplex for- 
mation after the first few base pairs have formed, or to 
make short sections of duplex detach very quickly.'^^'^ 
Our simulations also suggest that if the failure of initial 
contacts is the cause of a negative activation enthalpy, 
precise experiments should show that this enthalpy be- 
comes more negative with increasing temperature. In- 
ternal displacement can also enhance negative activation 
enthalpies. 



By providing insight into the complex mechanisms by 
which two DNA strands associate, we can thus suggest 
ways to modulate strand association rate and so choreo- 
graph the assembly and operation of DNA nanotechnol- 
ogy. Future work will consider the role of single-stranded 
hairpins in determining reaction kinetics, and the conse- 
quences of internal displacement for the association of 
longer strands. 
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Appendix A: The DNA model 

OxDNA and its interaction potentials have been de- 
scribed in detail elsewhereP^^^ The model represents 
DNA as a string of nucleotides, where each nucleotide 
(sugar, phosphate and base group) is a rigid body with 
interaction sites for backbone, stacking and hydrogen- 
bonding interactions. The potential energy of the system 
can be decomposed as 



V = 



E 

iij) 



b. + V'stack + K> 
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^coaxial stack 



^backbone 




^stack 



^cross stack 



FIG. 4. A model DNA duplex, with stabilising interactions de- 
picted schematically. The backbone sites are shown as spheres, 
the bases as ellipsoids. Backbone colouring indicates strand iden- 
tity. All nucleotides also interact with repulsive excluded volume 
interactions. The coaxial stacking interaction acts like a stacking 
interaction between bases that are not immediate neighbours along 
the backbone of a strand. Taken from Ref. 36 



HB 



(Al) 



where the first sum is taken over all nucleotides that 
are nearest neighbors on the same strand and the sec- 
ond sum comprises all remaining pairs. The interactions 
between nucleotides are schematically shown in Fig. |4] 
The backbone potential Vh.h. is an isotropic spring that 
imposes a finite maximum distance between backbone 
sites of neighbours, mimicking the covalent bonds along 
the strand. The hydrogen bonding (Vub)^ cross stacking 
(Kr.st.), coaxial stacking (Kx.st.) and stacking interac- 
tions (Fstack) are anisotropic and explicitly depend on the 
relative orientations of the nucleotides as well as the dis- 
tance between the relevant interaction sites. This orien- 
tational dependence captures the planarity of bases, and 
helps drive the formation of helical duplexes. The coaxial 
stacking term is designed to capture stacking interactions 
between bases that are not immediate neighbours along 
the backbone of a strand. Bases and backbones also have 
excluded volume interactions Fgxc or Kxc 

Hydrogen-bonding interactions are only possible be- 
tween complementary (A-T and C-G) basepairs. In the 
sequence-dependent parameterization, the strengths of 
interactions Fgtack and Vub further depend on the iden- 
tity of the bases involved. In the average model, Fstack is 
sequence-independent and Vrb is equivalent for (A-T and 
C-G) base pairs. Interactions were fitted to reproduce 
melting temperatures and transition widths of oligonu- 
cleotides, as predicted by SantaLucia's nearest-neighbor 
model. Note that our three dimensional model is signif- 
icantly more complex than the nearest-neighbour model. 
We simply treat the latter as a high-quality fit to ex- 
perimental data. Structural and mechanical properties 
of both double- and single-stranded DNA are also care- 
fully taken into account in the fitting procedure. In DNA 
the double helical structure emerges because there is a 



length-scale mismatch between the preferred inter-base 
distance along the backbone, and the optimal separation 
of bases when stacking. It is exactly this feature that 
drives the helicity of oxDNA, rather than an imposed 
natural twist on the backbone. Overall, the emphasis 
in our derivation of oxDNA was on physics relevant for 
single-strand to duplex transitions. As discussed in the 
main text, oxDNA has been extensively tested for other 
DNA properties and systems to which it was not fitted. 
Our success in describing all these phenomena gives us 
confidence to use it to study the dynamics of hybridiza- 
tion. 

OxDNA was fitted to reproduce DNA behavior at 
salt concentration [Na^] = 0.5M, where the electrostatic 
properties are strongly screened, and it is reasonable to 
incorporate them into a short-ranged excluded volume. 
The model therefore contains no further explicit elec- 
trostatic interactions. It should be noted that OxDNA 
neglects several features of DNA structure and interac- 
tions due to the high level of coarse-graining. Specifi- 
cally, the double helix in the model is symmetrical rather 
than the grooves between the backbone sites having dif- 
ferent sizes (i.e., major and minor grooving), and all four 
nucleotides have the same structure. These differences 
with real DNA mean that oxDNA will not be able to 
treat phenomena that depend sensitively, for example, 
on anisotropic elasticity, explicit salt ion effects, or the 
existence of major and minor grooving. However, these 
specific properties of DNA are unlikely to be critical to 
the general arguments we are making about hybridiza- 
tion in this paper. Rather, it is the correct treatment of 
the basic mechanical properties of both single and dou- 
ble strands, together with the basic physics of hydrogen 
bonding and stacking that determines the emergent phys- 
ical phenomena we are trying to describe. 

We assume that the partial pressure of DNA in di- 
lute solution is negligible relative to that of our implicit 
solvent. In this limit, it is appropriate to consider the 
coarse-grained DNA strands in the canonical ensemble, 
even when comparing to experiments performed at con- 
stant pressure. We therefore use the terms enthalpy and 
energy interchangeably, with enthalpy most natural for 
comparisons with the experimental literature, and energy 
most natural when discussing the implementation of the 
model. 



Appendix B: Simulation Methods 

The thermodynamic properties of model DNA are 
given by averaging over the Boltzmann distribution 



(Bl) 

In this equation H is the system Hamiltonian, r and q 
are positional and orientational coordinates and p and 
L are linear and angular momenta. As terms in H con- 
taining and are separable and can be analytically 
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integrated, the relative probability of a configuration is 
given by the Boltzmann factor of its potential energy, 
exp(-/3F(r^,q^)). 

Inferring model kinetics necessitates an additional 
choice of dynamical algorithm. In this work we use 
Langevin Dynamics (LD) and a Brownian thermostat 
to measure dynamical properties. Virtual Move Monte 
Carlo (VMMC) is also used to calculate thermodynamic 
averages. For the dynamical algorithms, it is necessary to 
define a nucleotide mass which is taken as m = 315.75 Da 
for all nucleotides.^^ For dynamical purposes, we treat 
the nucleotides as spherical with a moment of inertia 
31.586 Da nm^.l^ The specification of mass, length and 
energy scales in the model together imply a time scale. 
For completeness, we will quote results in the supple- 
mentary material terms of this time scale, although as 
discussed in the main text only relative times are physi- 
cally meaningful. 



Langevin Dynamics 



LD is a formalism for including random and dissipa- 
tive forces due to an implicit solvent in a self-consistent 
manner so that solute particles move diffusively and the 
system samples from the Boltzmann distribution. New- 
ton's equations of motion for the solute particles can be 
augmented with these forces and integrated to give dy- 
namical trajectories. The results reported in this work 
were obtained using the quaternion-based algorithm of 
Davidchack et al)^ To use LD, it is necessary to spec- 
ify a friction tensor relating the drag forces experienced 
by a particle to its generalized momenta. We treat each 
nucleotide's interaction with the solvent as spherically 
symmetric, simplifying the friction tensors and leaving 
only two independent quantities, the linear and rota- 
tional damping coefficients 7 and F. We choose values of 
7 = 0.59 ps~^ and F = 1.76 ps~^. These values produce 
overall diffusion coefficients of I^sim = 1.91 x 10~^ m^s~^ 
for a 14 base-pair duplex, higher than experimental mea- 
surements of i^exp = 1-19 X 10~^^ m^s""*^.^ As discussed 
in the main text, accelerated diffusion is an advantageous 
aspect of coarse-grained modelling, allowing the simula- 
tions to access more complex processes. We show in Ta- 



ble III that using higher friction constants for the simple 
case of a non-repetitive sequence at 300 K slows down hy- 
bridization, but does not qualitatively affect our results 
otherwise: in particular, the tendency for initial contacts 
not to proceed to full duplex formation is preserved. LD 
Simulations in this work use a time step of 8.55 fs. This 
time step has been previously shown to reproduce the 
energies and kinetics of shorter time steps for the DNA 
model. 



a. Forward flux sampling 

'Brute force' Langevin simulations are not always effi- 
cient enough to sample rare transitions. Forward ffux 
sampling (FFS) allows the calculation of the ffux be- 
tween two local minima of free energy, and also samples 
from the trajectories that link the two minima {reactive 
trajectories).'^'^ Here we present a brief discussion of the 
FFS method in general. Our particular implementation 
will be discussed later. 

The term 'ffux' from (meta) stable state A to state B 
has the following definition. 

Given an infinitely long simulation in 
which many transitions are observed, the 
fiux of trajectories from A to B is ^ab = 
^AB /{'TfA)^ where Nab is the number of 
times the simulation leaves A and then 
reaches 5, r is the total time simulated and 
/a is the fraction of the total time simulated 
for which state A has been more recently vis- 
ited than state B. 

The concept of fiux is therefore a generalization of a tran- 
sition rate for processes that are not instantaneous: it in- 
corporates the time spent in intermediate states between 
A and B. Subtleties relating to the inference of rates 
from our simulations are discussed in Appendix |B 4[ 

To use FFS, we require an order parameter Q which 
measures the extent of the reaction, such that non- 
intersecting interfaces, Xn_i can be drawn between con- 
secutive values of Q. Initially, simulations are performed 
that begin in the lowest value of Q (which we define as 
Q = —2), and the fiux of trajectories crossing the surface 
X^i (for the first time since leaving Q = —2) is measured. 
We define the lowest value of Q as Q = — 2 because the 
simulation procedure is distinct for Q > 0. 

The total fiux of trajectories from Q = — 2 to the alter- 
native minima {Q = Qmax) is then calculated as the fiux 
across A^^^ from Q = — 2, multiplied by the conditional 
probability that these trajectories reach Q = Qmax before 
returning to to Q = — 2. This probability can be factor- 
ized into the product of the probabilities of trajectories 
starting from the interface Ag_^ reaching the interface 
Ag^^ before returning to Q = — 2 to yield: 



Q=l 



(B2) 



In this work we use two distinct approaches to evalu- 
ating the product in equation |B2[ known as direct FFS 
and Rosenbluth FFS. Direct FFS proceeds by randomly 
loading microstates at the interface A^^^ saved during the 
calculation of the initial fiux, and using these as start- 
ing configurations from which to estimate P(Aq|A^;l) t>y 
direct simulation. The process is then iterated for suc- 
cessive interfaces, using the successful trajectories from 
the previous interface as initial configurations for the 
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reasonable. This is possible because, during the sampling 
of the flux across interface A^^, successful transitions to 
Qmax are occasionally observed for the simplest cases. 
For example, during the simulation of the 14-base strands 
in which only native contacts were permitted, six transi- 
tions were observed in 72 /is, giving a rate of ~ 8x 10^ s~^, 
consistent with the estimate of (6.23±0.47) x 10^ s~^ from 



FFS (Table XIII). 



FIG. 5. Schematic illustrations of FFS. An order parameter Q is 
defined with interfaces A separating distinct value of Q. We are 
interested in measuring the flux from Q = —2 to Q = 2 in this ex- 
ample. A) The initial measurement of flux across the interface A^-,^. 
Orange dots indicate the crossings that contribute to the flux, and 
also the states used to launch subsequent stages of simulation. B) 
Direct FFS involves randomly launching many trajectories from the 
interface A^-,^, and measuring the probability of reaching Aj before 
returning to \Z.\- This procedure is then repeated for successive 
interfaces, resulting in branched trajectories. C) Rosenbluth FFS 
grows complete reactive trajectories in isolation: for each point on 
the previous interface, a flxed number of trajectories are launched. 
If one or more successfully reach the next interface, a single case is 
chosen at random and the rest are discarded. 



next, allowing the estimation of P{\^_i\\^Z\) for all 
relevant values of Q. Thus the flux from from Q = —2 
to Q = Qmax can be calculated, and the trajectories ob- 
tained sample from the distribution of reactive trajec- 
tories. Rosenbluth FFS is an alternative approach in 
which, instead of successively performing a large num- 
ber of simulations at every interface, individual reactive 
trajectories are generated independently by performing 
a small number of simulations at each interface for each 
trajectory. If multiple attempts are successful at a given 
interface, one is chosen at random and the rest are dis- 
carded. The contrasting approaches are illustrated in 
Fig. |5j Note that extracting the flux and ensemble of re- 
active trajectories requires a re- weighting procedure for 
Rosenbluth FFS, d ue to the fact that some trajectories 
are 

discarded.!™ 

Direct FFS naturally produces branched trajectories, 
whereas Rosenbluth FFS does not. Rosenbluth sampling 
thus provides an equally good sampling of the initial 
stages of a reaction as the final stages, unlike direct FFS 
which samples the later stages in more detail. Unfortu- 
nately, however, Rosenbluth sampling is less efficient in 
obtaining reactive trajectories due to a tendency to gen- 
erate successful simulations that are then discarded (to 
avoid branching). For these reasons we used Rosenbluth 
FFS where possible, but used direct FFS for the most 
difficult simulations. 

The random error associated with FFS simulations can 
be estimated in the following way. Measurements of the 
flux across A^^ are performed using N independent sim- 
ulations. The daughter trajectories of any one of these 
initial simulations therefore give an independent estimate 
of the flux. We report the error as cr /\/N — 1, where 
is the variance of the N independent estimates. 

It is sensible to check that the FFS measurements are 



2. Brownian Thermostat 

A simple alternative to LD is to use an algorithm which 
evolves the system according to Newton's equations for 
a fixed length of time, then resample a fraction of the 
velocities and angular velocities from the Maxwell distri- 
bution. We refer to this type of algorithm as a Brownain 
thermostat, and our simulations were performed using 
the thermostat described in Ref. i41i The simulation al- 
gorithm performs Verlet integratioiP^ for a given num- 
ber of steps A^Newt, then resets the velocity of each nu- 
cleotide with probability py and the angular velocity of 
each nucleotide with a probability p^. The newly as- 
signed velocities and angular velocities are drawn from 
the Boltzmann distribution. In our simulations, we chose 
Py = 0.02, p^ = 0.0068 and A^Newt = 103. On time 
scales longer than A'Newt^^^P^;, where 5t is the integration 
time step, the dynamics is diffusive. Using St = 8.53 fs, 
14-base strands of DNA have a diffusion coefficient of 
7.6 X 10~^m^s~^ using this algorithm, higher than the 
^sim = 1-91 X 10~^m^s~^ measured for our LD algo- 
rithm. 



3. VMMC 



is a Monte Carlo technique effective for di- 
luted systems with strong, directional interactions such 
as our DNA model. The algorithm generates a series 
of configurations of a system that are drawn from the 
Boltzmann distribution. The algorithm moves from one 
configuration to the next by attempting moves of clus- 
ters that are generated in a manner that reflects local 
potential energy gradients in the system. Trial moves 
are accepted with a probability that ensures the system 
samples from the canonical ensemble. By moving clusters 
of strongly interacting particles, the algorithm is able to 
equilibrate model DNA systems much faster than simpler 
Monte Carlo algorithms. To use VMMC, it is necessary 
to select 'seed' moves of a single particle: the resultant 
energy changes are used to generate the cluster. For all 
VMMC simulations reported here, the seed moves were: 

• Rotation of a nucleotide about its backbone site, 
with the axis chosen from a uniform random dis- 
tribution and the angle from a normal distribution 
with mean of zero and a standard deviation of 0.12 
radians. 
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Translation of a nucleotide with the direction cho- 



zero and a standard deviation of 1.02 A. 



a. Umbrella sampling 

Despite the simplicity of the model and the efficiency 
of VMMC, many processes are still slow to equilibrate 
due to the presence of large free-energy barriers. These 
barriers can be artificially flatenned, and equilibration 
enhanced, b^ incorporating an additional biasing weight 



B 



qAT ^45] ^]^^g approach, known as umbrella sam- 



pling, iy(r^,q^) is chosen to favour the states of high 
free-energy, and the expectation of any variable A can be 
extracted as 



(A) 



{l/WirN,q^))w 



w 



(B3) 



Here {)w indicates sampling from the ensem- 
ble in which states have a relative probability 
W{r^, q^) exp(-/3F(r^, q^)). 



4. Considerations of metastable states, fluxes and system 
size 

FFS is not effective at simulating transitions with long- 
lived metastable intermediates, as the process of escaping 
these intermediates must be directly simulated through 
brute- force methods. Such long lived intermediates are 
present with the repetitive sequences we have studied. 
FFS can, however, be used to simulate separately the flux 
of trajectories into and out of these states. This data 
can then be used to estimate overall reaction kinetics, 
by constructing a model such as that illustrated in |6]A. 
Performing this calculation implicitly assumes that the 
system equilibrates within the metastable intermediate 
states before making another transition. 

In our work, we have employed this approximation to 
study repetitive sequences, defining a number of mis- 
bonded intermediates and calculating the fluxes between 
them. The states considered are generally long-lived, 
show limited heterogeneity in structure within a state 
and are separated by significant free-energy barriers from 
other states. Thus the quasi-equilibrium assumption is 
reasonable. 

Formation of either a metastable intermediate or the 
fully bonded state occurs in a short time scale after ini- 
tial contact has been made, relative to the overall time 
spent in the single-stranded ensemble. Thus it is sensible 
to interpret the measured fluxes from the unbound en- 
semble directly as instantaneous reaction rates. Due to 
the existence of metastable intermediates for repetitive 
sequences, however, the overall process of moving from 
unbound to fully bound ensembles is not effectively in- 
stantaneous on the time scales that strands encounter one 



and the 




rh 




mean of 






'^y'Vs^i2 


















^02 





■^01 




1^03 


Pi' 




K 


1^02 


^*(2) ' 



FIG. 6. Reaction kinetics in the presence of metastable inter- 
mediates. A) Fluxes <l> contributing to the kinetics of formation 
of bound state "3" from the unbound ensemble "0" (taken as the 
left-hand-side of the diagram), in the presence of two metastable 
intermediates "1" and "2". B) Simplified picture used to interpret 
our results, in which each bound state is formed with a certain 
rate, and states 1 and 2 convert to the fully bound state 3 instan- 
taneously with probabilities Pi and P2. The overall reaction rate 
is then /cos + koiPi + ^02^2- 



another through diffusion. In our simulations, a signifi- 
cant amount of time can be spent in misaligned registers 
or pseudoknots. In principle, this makes the definition of 
an overall reaction rate problematic. 

For computational tract ability, however, we have sim- 
ulated systems of two DNA strands with a relatively 
small volume, giving strand concentrations of ~ 100 jaM. 
This is much higher than tvpical experiments - for in- 
stance, Zhang and Winfree^ used a concentrations of or- 
der 1 nM. In our system, all interactions are short ranged 
and hence the strands behave approximately ideally un- 
less they come into close contact. Thus the effect of dilut- 
ing the system is trivial: using a cell of twice the volume 
would halve the rate at which strands came into contact. 
Dilution should not, however, affect the rate at which 
strands dissociate or internally rearrange once attached, 
and nor will it affect the probability of choosing a given 
pathway out of a misbonded configuration. Therefore, 
although in our simulations a significant amount of time 
can be spent in metastable intermediates, at the much 
lower concentrations relevant to experiment these times 
will be negligible compared to the diffusional timescales 
required to make contact. 

When we compare relative rates to experiment, the 
time required to form an initial contact should scale with 
the dilution, and thus relative rates of initial attachment 
measured in simulation are directly comparable to ex- 
periment. Any time spent in metastable intermediates, 
however, should be subtracted from the total time to 
reach a duplex from the single-stranded state in order 
to make a fair comparison. Practically, this means that 
when analysing the repetitive sequence, we simply use the 
measured fluxes out of each metastable state to calculate 
the probability that a given intermediate will progress to 
the fully formed duplex before dissociating. The overall 
reaction rate /Con is then 



(B4) 



where the sum runs over all bound states i, ki is the 
formation rate of i from the unbound ensemble and Pj 
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Order parameter 


Separation 


Nearly-formed Set of base pairs A 


Set of base pairs B 


Q 


d/mn. 


base pairs n 


with E < Ea 


with E < Eb 


Q = -2 


d > 5.11 








Q = -l 


5.11 >d> 3.42 








Q = 


3.42 > > 2.56 








Q = 1 


2.56 >d> 1.71 








Q = 2 


1.71 > d > 0.85 








Q = 3 


d < 0.85 


n = 


\A\=0 


\B\=0 


Q = 4 




n > 1 


\A\=0 


\B\ = 


g = 5 






{\A\>lk\B\=0) 


or {\A\ = lk\B\ = l) 


Q = 6 




|A| > 2 & 


\B\>lk (A^ Sr or 


B ^Srorn^ 0*) 


Q = 7 




n = 0* 


AeSr 


B eSr 



TABLE 1. Order parameter definitions for FFS simulations of binding. | A| is the number of base pairs in set A. A ^ Sr indicates that the 
set of interactions A is in the set of sets Sr that defines a target state of the simulations. For non-repetitive systems, only 5*0 is relevant. 
For repetitive sequences at T = 300 K, misaligned structures with at least 4 base pairs (14 — |ri| > 4) are considered, and the pseudoknots 
{ri,r2} = {—4,8}, {—6,6}, {—6,8}, {—8,4} and {—8,6}. For repetitive sequences at T = 340.9 K, only registers with at least eight base 
pairs are considered (14 — |ri| > 8), and pseudoknots are not. n = 0* indicates that nearly formed base pairs are forbidden, except in 
pseudoknot cases when nearly- formed base pairs from the relevant registers are allowed. The symbol indicates that no restriction is 
placed on this generalised coordinate except those that follow implicitly from the other requirements. We use two energy cutoffs to monitor 
base-pairing: Ea — — 1.43 kcal mol~^ and Eb = — 1.79kcalmol~^. 



300 



312.5 



Temperature /K 
326.1 



340.9 



300' 



Number of simulations 
for flux across A°i 



20 



20 



20 



20 



20 



Initialization time 
per simulation /ns 



860 



860 



860 



860 



860 



Crossings of A°i 
(time taken/ /xs) 



8106 (75) 



8059 (69) 



7948 (58) 



8041 (50) 



7854 (72) 



Flux across X^i 
//is-i 



108 ±4 



116 ±2 



138 ±5 



160 ±8 



109 ±4 



Total trajectories 
started from A° i 



12000 



4000 



4000 



4000 



6000 



Target interface 
A? 



A4 



0.427 ±0.005 (5) 
0.496 zb 0.003 (5) 
0.521 =b 0.004 (5) 
0.534 ±0.004 (5) 
0.146 ±0.002 (20) 
0.252 ± 0.01 (5) 
0.326 ± 0.02 (5) 



Success probability (attempts per trajectory) 



0.420 ± 0.009 (5) 
0.517 ±0.006 (5) 
0.528 ± 0.008 (5) 
0.567 ±0.008 (5) 
0.154 ±0.004 (40) 
0.182 ±0.01 (10) 
0.265 ± 0.03 (10) 



0.418 ± 0.008 (5) 
0.495 ± 0.007 (5) 
0.524 ±0.006 (5) 
0.580 ±0.009 (5) 
0.165 ±0.003 (40) 
0.118 ±0.008 (10) 
0.181 ±0.02 (10) 



0.427 ±0.008 (5) 
0.514 ±0.008 (5) 
0.545 ± 0.006 (5) 
0.609 ±0.008 (5) 
0.161 ±0.004 (40) 
0.0908 ±0.009 (10) 
0.0778 ± 0.01 (10) 



0.431 ± 0.006 (5) 
0.504 ± 0.004 (5) 
0.514 ±0.006 (5) 
0.275 ±0.006 (5) 
0.0685 ± 0.003 (20) 
0.418 ±0.02 (5) 
0.648 ± 0.02 (5) 



Total reactive 
trajectories found 



1535 



608 



485 



244 



829 



TABLE II. Details of Rosenbluth FFS simulations performed to estimate the kinetics of binding for non-repetitive sequences 
at a range of temperatures (300' corresponds to simulations in which only native bonds had a non-zero interaction). The top 
half of the table reports the initial simulations of flux across the X^_i interface. The bottom half contains the simulation data 
from the later stages: the total number of trajectories initiated at X-i^ the success rates of attempts to reach the next interface 
and the number of attempts per trajectory performed at each interface. 
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Friction coefficients 

7 /ps-\ r /ps-i 


0.59, 1.76 


5.9, 17.6 




Temper at ure/K 
300 340.9 


/ m^s-i 


1.91 X 10~^ 


1.91 X 10~^° 


for flux across X^_i 


50 


50 


for flux across X^_i 


20 


20 


Initialization time 
per simulation /ns 


860 


860 


Initialization time 
per simulation /ns 


860 


860 


Crossings of X^i 
(time taken//is) 


19776 (186) 


19973 (133) 


Crossings of X^i 
(time taken//is) 


8106 (75) 


7734 (479) 


Flux across X^i 


106 ±2 


150 ±3 


Flux across X^i 


108 ±4 


16.2 ±0.5 


Total trajectories 
loaded from X^i 


15000 


6000 


Total trajectories 
loaded from X^i 

Target interface 

xh 

A? 

M 


12000 6000 

Success probability 
(attempts per trajectory) 
0.427 ± 0.005 (5) 0.394 ± 0.005 (5) 
0.496 zb 0.003 (5) 0.476 ± 0.006 (5) 
0.521 ± 0.004 (5) 0.513 ± 0.007 (5) 
0.534 ± 0.004 (5) 0.630 =b 0.005 (5) 


Target interface 

Ai 

A| 
At 
At 
M 


Success probability 
(attempts per trajectory) 
0.434 ± 0.004 (5) 0.417 ± 0.004 (5) 
0.505 ± 0.003 (5) 0.504 ± 0.005 (5) 
0.519 ± 0.003 (5) 0.532 ± 0.005 (5) 
0.633 ± 0.003 (5) 0.707 ± 0.006 (5) 
0.185 ±0.002 (20) 0.210 ±0.003 (40) 
0.442 ± 0.007 (5) 0.211 ± 0.009 (10) 
0.642 ± 0.009 (5) 0.143 ± 0.009 (10) 


At 

Ai 


0.146 d= 0.002 (20) 
0.252 ±0.01 (5) 
0.326 ±0.02 (5) 


0.258 ± 0.003 (20) 
0.190 ±0.008 (5) 
0.267 ±0.02 (5) 


Total reactive 
trajectories found 


5523 


1207 



Total reactive 1535 584 

trajectories found 



TABLE III. Comparison of Rosenbluth FFS simulations of 
hybridization for non-repetitive sequences at 300 K for im- 
plementations of LD with different friction constants. 7 = 
0.59 ps~^ and F = 1.76 ps~^ are the standard values used in 
this work. The top half of the table reports the initial sim- 
ulations of flux across the A^i interface. The bottom half 
contains the simulation data from the later stages: the to- 
tal number of trajectories initiated at A?.i, the success rates 
of attempts to reach the next interface and the number of 
attempts per trajectory performed at each interface. 

is the probability that such a state will convert to the 
fully-formed structure before dissociating. We illustrate 
this analysis schematically in Fig.[6]B. For the sequence- 
dependent study, we simply subtract the time in which 
the two strands have one or more base pairs from the 
total time that it takes to reach the fully-bound state. 
We thus compensate for the fact that rearrangement is a 
significant contribution to reaction times at our concen- 
trations, but not in the dilute limit. 

Appendix C: Simulation protocols 

In this section we discuss the implementation of the al- 
gorithms of Appendix [B] for the specific systems studied 



TABLE IV. Details of Rosenbluth FFS simulations performed 
to estimate the kinetics of binding for repetitive sequences at 
300 K and 340.9 K. The top half of the table reports the initial 
simulations of flux across the A^i interface. The bottom half 
contains the simulation data from the later stages: the to- 
tal number of trajectories initiated at A^i, the success rates 
of attempts to reach the next interface and the number of 
attempts per trajectory performed at each interface. 



in this work, and present some raw data from the sim- 
ulations which would allow reproduction of the results. 
Processed data is presented in Appendix [P] To facilitate 
the discussion, we introduce the following concepts. 

• Native base pairs are those that are expected to 
form in the fully-bound structure. 

• For repetitive sequences, a number of metastable 
intermediates exist. Some of these are simply mis- 
aligned structures, which can be unambiguously de- 
fined by their register: a register of ri corresponds 
to bases pairing with a partner offset by ri bases 
in the 5' direction from their native partner. For 
non-repetitive sequences, only n = is relevant. 
The maximum number of base pairs in a register 
ri is 14 — |ri|. 

• Other structures involve two registers {ri,r2} in 
a pseudoknotted configuration. We found that 
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Order parameter Set of base pairs Nearly- formed Set of base pairs Set of base pairs Set of base pairs 
Q A base pairs n B C not in Svq D not in Svq 

with E < Ea with E < Eb with E < Ea with E < Eb 



Q = 


-2 


A. G Stq 


n = B e Sro 














Q = 


-1 




{B e Sro ^ riy^O) or B ^ Sro 


iq = o 




1^1 


= 






Q = 









(|C7|>1&|Z3| 


= 0) 


or (Id = 


1 & 


\D\ 


= 1) 


Q = 


1 






i\C\ >2k \D\ 


= 1) 


or (Id = 


2 & 


\D\ 


= 2) 


Q = 


2 






( C > 3 & L» 


= 2) 


or (Id = 


3 & 


\D\ 


= 3) 


Q = 


3 




^ 5^/ orn ^ 0* or B ^ Sr' 


iq>4 




1^1 


> 3 






Q = 


4 


A G Sr' 


n = 0* B € 5^/ 















TABLE V. Order parameter definitions for FFS simulations of internal displacement. \A\ is the number of base pairs in set A. A ^ Sr 
indicates that the set of interactions A is in the set of sets Sr. SrQ is the set corresponding to the initial (misaligned) state of the system. 
Sj,f is the set corresponding to any other possible target state. For repetitive sequences at T = 300 K, misaligned structures with at least 
four base pairs (14— |ri| > 4) are considered, and the pseudoknots {ri,r2} = {—4,8}, {—6,6}, {—6,8}, {—8,4} and {—8,6}. For repetitive 
sequences at T = 340.9 K, only registers with at least eight base pairs (14 — |ri| > 8) are considered, and pseudoknots are not. n = 0* 
indicates that nearly formed base pairs are forbidden, except in pseudoknot cases when nearly-formed base pairs from the relevant registers 
are allowed. The symbol '~' indicates that no restriction is placed on this generalised coordinate except those that follow implicitly from 
the other requirements. We use two energy cutoffs to monitor base-pairing: Ea = — 1.43 kcal mol"-*^ and Eb = — 1.79 kcal mol"-*^ . During 
simulations, the system was also monitored to check for dissociation (when d^in 

> 5.11 nm). 



{ri,r2} = {-4,8}, {-6,6}, {-6,8}, {-8,4} and 
{—8,6} were long-lived metastable states, as dis- 
placement of one arm by the other is limited. 

• Given criteria for identifying base-pairing interac- 
tions, a given state of the system will have a set of 
interactions A. Let A be in the set of sets Sr if A 
is characteristic of the bonding pattern in register 
r = n or pseudoknot r = {ri, r2}. For purely mis- 
aligned structures, we take A to be in Sr^ if every 
possible base pair in register ri is present, with no 
other interactions. Pseudoknots have a greater de- 
gree of heterogeneity (base pairs can be exchanged 
between registers). For our purposes, a set of in- 
teractions A is in 5'ri,r2 if ^ind only if each of the 
registers has at least 6 interactions, and no other 
interactions are present. 

• The separation dmin is the minimum distance be- 
tween hydrogen-bonding sites over all pairs of bases 
in the two strands. 

• One way of identifying interactions is through the 
nearly formed base pair. A potential base pair be- 
tween the strands is counted as nearly formed when 
the conditions outlined below hold. 

— The separation of hydrogen-bonding sites is 
< 0.85 nm. 

— The hydrogen-bonding potential consists of a 
separation dependent factor multiplied by a 
number of modulating angular factors. At 
most one of these factors that contributes mul- 
tiplicatively to the hydrogen-bonding energy 
is zero. 

— The hydrogen-bonding interaction is less neg- 
ative (weaker) than — 1.43kcalmol~^. Typ- 



ical hydrogen bonds have enthalpies of ^ 
— 3.6kcal mol~^. 

Physically, these conditions mean that the bases 
are close and fairly well aligned, but not forming a 
strong base pair. 

1. Hybridization of non-repetitive duplexes 

Studies of hybridization were performed using Rosen- 
bluth FFS. The order parameter Q, as detailed in Table [l| 
combines separation-based and interaction strength met- 
rics. Its complicated form is designed to optimize sam- 
pling, and reduce the possibility of two interfaces being 
crossed in a single integration time step (this is aided by 
the use of two different energy cutoffs for quantifying the 
degree of base-pairing). The condition for Q = 7 indi- 
cates a bound state in this work, and will be relevant in 
other cases. For the repetitive sequences, we simultane- 
ously measure the flux into a number of possible binding 
registers. Data from the simulations are given in Table 

mi 

2. Hybridization of repetitive duplexes 

Measurements of the initial stage of attachment were 
performed exactly analogously to the non-repetitive du- 
plexes, and the order parameter is outlined in Table |Tj In 
this case, the flux of trajectories into several metastable 
structures was measured simultaneously. At 300 K, 
purely misaligned structures with at least four base pairs 
(ri = 0, ±2, ±4, ±6, ±8, ±10) were considered, as well 
as the pseudoknots {ri,r2} = {—4,8}, {—6,6}, {—6,8}, 
{-8,4} and {-8,6}. At 340.9 K, only misaligned struc- 
tures with at least eight base pairs were considered as 
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A 


2 


4 


Initial register 
6 


8 


10 


Number of simulations 
for flux across A°i 


10 


10 


10 


10 


10 


Initialization time 
per simulation /ns 


8.6 


8.6 


8.6 


8.6 


8.6 


Crossings of A° i 
(time taken/ /xs) 


1000 (39) 


1000 (54) 


1000 (52) 


1000 (46) 


1000 (41) 


Flux across A?_i 


25.9 ±0.6 


18.7 ±0.8 


19.4 ±0.7 


21.5 ±0.7 


24.4 ± 1 


Total trajectories 
started from A° i 


10000 


5000 


5000 


6000 


5000 


Target interface 

xh 

A? 
Ai 

xt 


Success probability (attempts per trajectory) 
0.155 ±0.006 (20) 0.258 ±0.01 (20) 0.275 ± 0.01 (20) 0.260 ± 0.01 (10) 
0.239 ±0.02 (20) 0.232 ± 0.02 (20) 0.224 ± 0.01 (20) 0.261 ± 0.02 (10) 
0.170 ± 0.02 (20) 0.235 ±0.02 (20) 0.248 ± 0.01 (20) 0.504 ± 0.02 (10) 
0.115 ±0.008 (6) 0.193 ±0.008 (6) 0.355 ± 0.03 (6) 0.807 ± 0.03 (10) 


0.289 ±0.01 (10) 
0.401 ± 0.02 (3) 
0.860 ±0.02 (3) 
0.971 ± 0.005 (2) 


Total reactive 
trajectories found 


775 


1052 


1365 


1309 


1245 


B 


-2 


-4 


Initial register 
-6 


-8 


-10 


Number of simulations 
for flux across A?_i 


10 


10 


10 


10 


10 


Initialization time 
per simulation /ns 


8.6 


8.6 


8.6 


8.6 


8.6 


Crossings of A° i 
(time taken/ fis) 


1000 (41) 


1000 (54) 


1000 (54) 


1000 (50) 


1000 (42) 


Flux across A° i 


24.5 ±0.7 


18.4 ±0.6 


18.6 ±0.6 


20.2 ±0.6 


24.1 ±0.5 


Total trajectories 
started from A^i 


10000 


10000 


10000 


8500 


8500 


Target interface 
Aj 
A? 

^2 
^3 


Success probability (attempts per trajectory) 
0.182 ±0.009 (20) 0.266 ±0.01 (20) 0.273 ± 0.02 (20) 0.276 ± 0.01 (10) 0.308 ± 0.009 (10) 
0.240 ± 0.02 (20) 0.216 ± 0.02 (20) 0.243 ± 0.02 (20) 0.282 ± 0.01 (10) 0.436 ± 0.02 (3) 
0.221 ± 0.03(20) 0.212 ± 0.01 (20) 0.264 ± 0.02 (20) 0.482 ± 0.03 (10) 0.863 ± 0.01 (3) 
0.112 ±0.003 (6) 0.203 ± 0.007 (6) 0.344 ± 0.04 (6) 0.774 ± 0.02 (10) 0.960 ± 0.005 (2) 


Total reactive 
trajectories found 


823 


2139 


2731 


1957 


2281 



TABLE VI. Details of Rosenbluth FFS simulations performed to estimate the kinetics of internal displacement at 300 K, for 
positive (A) and negative (B) registers. The top half of each table reports the initial simulations of flux across the A^i interface. 
The bottom half contains the simulation data from the later stages: the total number of trajectories initiated at A?_i, the success 
rates of attempts to reach the next interface and the number of attempts per trajectory performed at each interface. 
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Initial register 
2 4 6 



i> UlllUcl Ui bllllLliclLlUnb 

for flux across A° i 


1 n 


1 n 


1 n 


Initialization time 
per simulation /ns 


8.6 


8.6 


8.6 


Crossings of 
(time taken //is) 


iUUU l^J-J- j 


iUUU 


1 nnn /'i n^ 

iUUU i^J-Uj 


Flux across X^i 


92 ±2 


93 ±3 


101 ±3 


Total trajectories 
started from A?_i 


6000 


6000 


6000 


Target interface 
A? 

^3 


Success probability (attempts per trajectory) 
0.195 ±0.008 (20) 0.248 ±0.01 (20) 0.234 ± 0.01 (20) 
0.294 ±0.02 (20) 0.330 ± 0.01 (20) 0.313 ± 0.02 (20) 
0.284 ±0.02 (20) 0.323 ± 0.01 (20) 0.433 ± 0.03 (20) 
0.142 ± 0.01 (6) 0.223 ± 0.01 (6) 0.311 ± 0.02 (6) 


Total reactive 


1285 


921 


971 



trajectories found 



TABLE VII. Details of Rosenbluth FFS simulations performed to estimate the kinetics of internal displacement at 340.9 K. The 
top half of the table reports the initial simulations of flux across the X^i interface. The bottom half contains the simulation 
data from the later stages: the total number of trajectories initiated at A^i, the success rates of attempts to reach the next 
interface and the number of attempts per trajectory performed at each interface. 



Order parameter Separation Set of base Nearly-formed Set of base Number of base Number of base 
Q d/nm pairs A base pairs n pairs B pairs X from Sro pairs Y from SrQ 

with E < Ea with E < Eb with E <0 with E < Ey 



Q = -2 
Q = -l 
Q = 



Q = m, l<m< 12- \ro\ ^ 



Q = 12-|ro| 
Q = 13-|ro| 



< 5.11 
> 5.11 



Ae Sro n = B e Sro ^ 

A^ Sro OT n^O OT B ^ Sro X > 13 - |ro| Y > 13 - \ro\ 

- - - (X < 13- |ro| & y < 13- |ro|) 

& 

(X>12-|ro| ory >ll-|ro|) 

- - - (X < 13- |ro| -m & y < 12- |ro| -m) 

(X > 12 - |ro| - m or y > 11 - |ro| - m) 
A^ Sr' or n^O* OT B ^ Sr' X <1 Y = 



TABLE VIII. Order parameter definitions for direct FFS simulations of dissociation of misaligned duplexes at 300 K. Order parameters 
were different for different initial states ro due to the differing number of base pairs. 14 — |ro| gives the total number of base pairs possible 
in the initial misaligned register ro- r' is any other possible bound structure, misaligned structures with more than four base pairs are 
considered as possible structures r^, along with the pseudoknots {ri,r2} = {—4,8}, {—6,6}, {—6,8}, {—8,4} and {—8,6}. n = 0* indicates 
that nearly formed base pairs are forbidden, except in pseudoknot cases when nearly-formed base pairs from the relevant registers are 
allowed. The symbol indicates that no restriction is placed on this generalised coordinate except those that follow implicitly from the 
other requirements. During simulations, the system was also monitored to check for the formation of alternative bound structures, through 
the criteria A e S^/ & n = 0* & B G 5^/ . The energy scales Ea = -1.43 kcal mol~^ , Eb = -1.79kcalmol~^ and Ey = -0.596 kcalmol"^ 
are used here. 
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Order parameter 


Separation Set of base 


Nearly- formed 


Set of base 


Number of base 


Number of base 


Q 


d/nm pairs A 


base pairs n 


pairs B 


pairs X from Sro 


pairs Y from Sro 




with E < Ea 




with E < Eb 


with ^ < 


with E < Ey 



Q = -2 
Q = -1 
Q = 



Q — l<m<6— I 



Q = 6- ||ro| 



Q = 7 



^0 







< 5.11 
> 5.11 



A ^ Sro or n OT B ^ ^r, 



A ^ Sr' OT n ^ OT B ^ Sr 



X > 13- |ro| Y > 13- |ro| 

{X < 13- |ro| & y < 13- |ro|) 
& 

(X > 11 - |ro| or y > 10- |ro|) 
(X < 13 - |ro| - 2m & y < 12 - |ro| - 2m) 
& 

(X > 11 - |ro| - 2m or y > 10 - |ro| - 2m) 

X <i y = 



TABLE IX. Order parameter definitions for direct FFS simulations of melting at 340.9 K. Order parameters were different for different 
initial configurations, due to the differing number of base pairs. 14 — |ro| gives the total number of base pairs possible in a misaligned 
register ro, with ro being the initial register of the system, r' is any other possible bound structure, misaligned structures with at least 
eight base pairs are considered as possible structures r', and pseudoknots are not included. The symbol indicates that no restriction is 
placed on this generalised coordinate except those that follow implicitly from the other requirements. During simulations, the system was 
also monitored to check for the formation of alternative bound structures, through the criteria A G S^f Sz n = Sz B ^ S^f . 



10 



Initial register 



-10 



Simulations run 
for flux across A^i 



10 



Initialization time 
per simulation /ns 



8.5 



8.5 



Crossings of A^i 39453 (5.3) 79915 (9.0) 99442 (9.4) 90096 (8.4) 
(time taken / /xs) 



Flux across X^i 
/ns-^ 



7.33* 



8.78 ±0.28 



10.6 ±0.1 



10.6 ±0.1 



Target interface Total attempts/successes at later stages 

Aj 20000 / 856 50000 / 2107 130000 / 6745 140000 / 7509 
A? 40000 / 1497 90000 / 1807 150000 / 10253 150000 / 10016 
Ai 3000 / 374 10000 / 2285 50000 / 3484 50000 / 3437 
A| N/A N/A 20000 / 3689 20000 / 4336 

At N/A N/A 12488 / 230 7000 / 111 



TABLE X. Simulation results for direct FFS simulations of melting of misaligned duplexes at 300.0 K. The top half of the table 
describes the initial flux simulations, and the bottom half contains the data from subsequent interfaces. The number of reactive 
pathways in direct FFS is simply the number of successes at the final interface. *Only two initial calculations of flux were run 
for this state, making the calculation of errors unreliable. Note that, particularly for the registers ±10, internal displacement 
processes were also observed during simulations. These processes are not well described by the FFS order parameter of melting, 
meaning that they are not accurately sampled in these simulations. An uneven presence of these displacement trajectories 
for registers ± 10 cau ses the large differences between individual entries for the two registers in the table. The overall rate of 
melting (Table XVI), however, is similar for the two cases, as it should be. This also suggests that the error on the register 10 
estimate is reasonably small. 



metastable targets, because other structures melt rapidly 
and require no explicit treatment. Details of the simula- 
tions are given in Table IV 



3. Internal displacement of misaligned repetitive 
sequences 



Measurements of internal displacement were performed 
using Rosenbluth FFS. The order parameter is outlined 
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Initial register 
4 



Simulations run 

for flnv i^pro'^'^ i 

±WX ±1 L-i-A. CljV_yX WOO y\ 1 


15 


48 


20 


Initialization time 
per simulation /ns 


8.5 


8.5 


8.5 


Crossings of A^i 
(time taken / /xs) 


149747 (7.7) 


479021 (23) 


401195 (18) 


Flux across X^i 
/ns-^ 


19.4 ±0.4 


20.3 ±0.2 


22.2 ±0.2 



Target interface Total attempts/successes at later stages 



A? 



^5 



160000 / 11789 260000 / 18607 250000 / 18150 
23000 / 10425 245000 / 10791 250000 / 11634 
122500/12577 33750 / 5174 32783 / 4795 
5000 / 2717 4500 / 587 24000 / 3963 
N/A 10000 / 3417 20500 / 4618 

N/A N/A 9000 / 1399 



TABLE XI. Simulation results for direct FFS simulations of melting of misaligned duplexes at 340.9 K. The top half of the table 
shows how many independent initial flux simulations were performed, and the initialization time for each of these simulations 
before data was recorded. The bottom half contains the simulation data: the total crossings of A^i and the time simulated 
in the first stage, and the total attempts and successes of the later stages. The number of reactive pathways in direct FFS is 
simply the number of successes at the final interface. 



Number of base pairs 


1 


2 


3 


4 


5 


6 


7 8 9 


10 


11 


12 13 


14 


Biasing weight 


10^^ 


5 X 10^^ 


2 X 10^^ 




5 X 10^ 


2 X 10^ 


10^ 5 X 10^ 2000 


100 


5 


0.2 0.01 


5 X 10-^ 


W at 300 K 
























Biasing weight 


3 X 10^ 


10^ 


3 X 10^ 


10^ 


3000 


1000 


300 100 30 


10 


3 


1 1 


1 


W at 340.9 K 

























TABLE XII. Umbrella potential used to bias equilibrium simulations of the duplex state. A base pair counts as formed if it 
has an energy E < — 0.596 kcalmol"^. 



in Table [V| At 300 K, we consider the flux of trajecto- 
ries from any misaligned structure to any other align- 
ment with at least 4 base pairs, as well as the metastable 
pseudoknot states {ri,r2} = {—4,8}, {—6,6}, {—6,8}, 
{—8,4} and {—8,6}. The relaxation of these metastable 
pseudoknots proved to be too difficult to simulate re- 
liably. At 340.9 K, only alignments with at least eight 
base pairs were considered. Simulations were monitored 
to check for strand dissociation: trajectories that resulted 
in dissociation were ended and counted as 'failures' for 
the purposes of measuring the flux of internal displace- 
ment. Further details of the simulations are given in 
Table ED 



4. Dissociation of misaligned repetitive sequences 

Measurements of dissociation of misaligned structures 
were performed using direct FFS. The order parameter 
is outlined in Tables [VIII| and [TX| At 300 K, dissociation 
was studied for registers ri = ±8 and ±10. Longer mis- 
aligned structures have a dissociation rate that is negligi- 
ble with respect to internal rearrangement. Simulations 
were monitored to check for rearrangement into alterna- 
tive metastable states: trajectories that resulted in inter- 
nal displacement were ended and counted as 'failures' for 
the purposes of measuring the dissociation flux. 

At 340.9 K, where dissociation is much faster, it was 
studied for registers ri = 2, 4 and 6. Registers —2, —4 
and —6 should be related to 2, 4 and 6 by the symmetry 
of the model, and were not simulated for the sake of effi- 
ciency. The other results presented here show no signifi- 
cant asymmetry between positive and negative registers. 
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justifying this approach. Further details of the simula- 



tions are given in Tables \X\ and XI 



Allowed base T/K flux / s ^ 2 bp success 

pairs probability 



5. Characterization of the equilibrium ensemble 

To understand the kinetic results, it is helpful to 
characterize the equilibrium ensemble of duplex states. 
VMMC simulations were performed on a pre-formed 14- 
base-pair duplex at 300 K and 340.9 K, with umbrella 
sampling used to enhance the sampling of states with a 
low degree of base-pairing (but forbid full detachment). 
The bias applied is detailed in Table |XII[ Four simula- 
tions were run for 10^ VMMC steps at each temperature, 
with an initialization period of 10^ steps. For simplic- 
ity, simulations were performed on systems in which only 
native base-pairing was permitted. During these simula- 
tions, the properties of states with base-pairing energies 
consistent with the penultimate FFS interface of associa- 
tion were saved. The possible states are given below. Fol- 
lowing the notation of Tablellj let Ea = — 1.43kcalmol~^ 
and Eb = — 1.79kcalmol~ . The two classes of states 



1. One base pair with E < Eb and one other base 
pair with E ^ Ea^ with no other base pairs with 
Eb < E < Ea- 

2. One or more base pairs with Eb < E < Ea and 
one other base pair with E ^ Eb^ with no other 
base pairs with E < Eb- 

In practice, to sample these states, E ^ Ex was taken 
8iS E = Ex ± 0.03kcalmol~^. The equilibrium prob- 
ability P(n) of n base pairs with an energy of £^ < 
— 0.596 kcalmol"^ being present was also measured. 

We also compared the intrastrand enthalpy (primar- 
ily arising from nearest-neighbor stacking interactions) 
in the single-stranded ensemble at 300 K to the value ob- 
tained from averaging over configurations at the penul- 
timate FFS interface of association. The single-stranded 
ensemble was sampled in an identical fashion to the du- 
plex simulations above, except that the umbrella poten- 
tial was set to unity in the absence of base pairs, and 
zero if any base pairs were present. All states were used 
to calculate the average intrastrand enthalpy. 



6. Sequence-dependence of association rate 

Pairs of 8-base strands were simulated using the Brow- 
nian thermostat in a periodic cell of volume 2.09 x 10~^^ 1, 
at a temperature 298.15 K (the temperature used in the 
experiment of Zhang and Winfree^). For each sequence, 
the time taken for association into the full duplex struc- 
ture was measured 1000 times. In each case, the system 
was initialized in the same single-stranded configuration, 
but with distinct nucleotide velocities. Any correlation 
resulting from using the same configuration is minimal, as 



Any 300.0 (7.67 ±0.75) x 10^ 

Any 312.5 (5.68 ± 0.77) x 10^ 

Any 326.1 (3.06 ± 0.43) x 10^ 

Any 340.9 (1.34 ± 0.24) x 10^ 

Native only 300 (6.23 ± 0.47) x 10^ 



0.33 ±0.018 
0.27 ±0.028 
0.18 ±0.016 
0.078 ±0.010 
0.65 ±0.023 



TABLE XIII. Total flux from unbound to fuUy bound state for 
non-repetitive sequences. Also shown is the probability of success- 
ful completion of duplex formation once the system has reached 
the penultimate FFS interface. 



the shortest association time in the simulations is three 
orders of magnitude larger than the equilibration and 
diffusion time scales in the single-stranded state. As dis- 
cussed in Appendix |B 4[ the time spent in structures with 
interactions present between the two strands was not in- 
cluded in this estimate of the association time. Errors in 
the estimates of rates were calculated using the standard 
error on the mean of 20 independent estimates, each ob- 
tained from 50 events. Additional simulations were per- 
formed in which base pairing interactions were restricted 
to native contacts. 



Appendix D: Results 

1. Hybridization of non-repetitive sequences 

The results of hybridization simulations for non- 
repetitive sequences are given in Table |XIII| We note 
that the absolute rate at T = 300 K (7.67 x 10"^ s"^) 
would, given the concentrations used in the simulations, 
translate into a bimolecular association rate of /cbi = 
7.71 X 10^ s~^. This value is approximately 100 times 
larger than typical experimental measurements. 

As discussed in Appendix [Bj we expect coarse-grained 
models to provide faster dynamics than real systems. To 
speed up simulations we used a diffusion coefficient that 
is 16 times higher than the experimentally measured one, 
which accounts for much of the difference. We argued 
there that our predictions are most reliable when taken 
as relative rates. A graph of In /con against 1/T, with kon 
the measured association rate, is plotted in Fig.[7|A. A 
pure Arrhenius model with a single, well-defined transi- 
tion state would give a straight line in such a plot. Our 
result is evidence of the complexity of the ensemble of 
transition pathways. The frequency of initial contacts (as 
a function of location within the strand) and the proba- 
bility of successful duplex formation given those contacts 
are plotted in Figs.[7|B and[7]C. These histograms show 
that initial contacts are more likely to occur at the ends 
of the strands, but more likely to succeed if they occur in 
the centre. However, all initial attachments occur in the 
transition pathway ensemble with a reasonable frequency. 
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FIG. 7. Binding kinetics for non-repetitive sequences. A) "Arrhenius plot" of the dependence of hybridization rate on temperature, with 
kon{T) = 7.67 X 10^ s"-*^ being the rate at 300 K. The Arrhenius model with a constant activation enthalpy predicts a straight line on 
such a plot. B) Frequency with which a certain base pair appears in states drawn from the penultimate interface of FFS simulations, for 
a sequence with native-only reactions at 300 K. Base pairs with indexes and 14 are at either end of the duplex. Probability of successful 
duplex formation given the formation of a base pair at the penultimate FFS interface, as a function of base pair index. 



2. Characterisation of the ensemble of transition 
pathways, the equilibrium duplex ensemble and the 
equilibrium single-stranded ensemble 

Equilibrium probabilities P(n) of n base pairs with 
an energy of £^ < —0.596 kcal mol~^ being present in 
the duplex ensemble (when only native base pairs are 
permitted) at 300 K and 340.9 K are plotted plotted in 
Fig.|8]as a free energy F{n)/RT = -ln(P(n)). Fig. [s] 
shows explicitly that adding a new base pair results in 
a substantial gain in free energy, even at 340. 9K. Only 
taking into account this secondary-structure analysis of 
the thermodynamic driving force implies that metastable 
states with one or two base-pairs formed should have a 
high probability of ending up in the duplex state. 



Ensemble 



Interstrand 
enthalpy /kcal mol~ 



Av. base pair 
separation /nm 




2 4 6 8 10 12 14 
Number of base pairs 

FIG. 8. Free-energy profile of the duplex state for a 14-base pair 
duplex with interactions exclusively between native base pairs, 
measured at 300 K and 340.9 K. A base pair is present if its en- 
ergy is below —0.596 kcal mol~^ . Errors on the points are of order 
0.1 kcalmol"-*^. The arbitrary offsets of the free energies are chosen 
so that the minimum of both curves has a value of zero. These 
simulations did not sample the unbound state, and hence are con- 
centration independent. We do not show the single-stranded state 
(0 base pairs) because this is concentration dependent, in contrast 
to the rest of the free-energy profile. For examples with this state 
included see e.g. Refs l27ll29l 

Despite this argument based on equilibrium free ener- 
gies of secondary structure, analysis of the FFS simula- 



Binding (kinetic) -7.04 
Duplex (equilibrium) -10.2 



2.84 
2.10 



TABLE XIV. Properties of ensembles observed during binding 
and at equilibrium. The binding ensemble consists of states ob- 
tained from kinetic simulations at the penultimate FFS interface 
(the last one before duplex formation), the equilibrium ensemble 
are states that satisfy the same base-pairing criteria, but drawn 
from an equilibrium sampling of the duplex state. 



tions of association indicate that states with initial con- 
tacts are surprisingly likely to fail to form a full duplex. 
The penultimate FFS interface in the simulations of as- 
sociation corresponds to configurations in which two base 
pairs are present with energy significantly more negative 
than E < —0.596 kcal mol~^. As can be seen from Table 
XIIIl simulations launched from this interface have a 33% 



chance of successfully zippering to form the full duplex at 
300 K, dropping to 7.8% at 340.9 K. Even in the absence 
of non-native bonds, the success rate is only 65% for con- 
figurations loaded at this interface at 300 K. As systems 
at this interface can either proceed to full duplex forma- 
tion or separate with roughly equal probability at 300 K, 
configurations stored at this interface are approximately 
representative of the 'transition ensemble' of the system. 

In the main text, we argue that these transition states 
are not representative of equilibrium states with the same 
degree of base pairing. To establish this, we compare 
states from the penultimate interface of FFS simulations 
of association in which only native base-pairs were per- 
mitted with states obtained from the equilibrium ensem- 
ble that satisfy the same base-pairing criteria. For the 
two ensembles, the average separation of native base- 
pair contacts and the average overall interstrand enthalpy 
were measured. The results are given in Table \XIV\ 

The difference in interstrand enthalpies between the 
two ensembles is primarily due to stronger stabilizing 
cross-stacking interactions in the equilibrium configura- 
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Register 



Flux / s-^ 
300 K 



340.9 K 



Correct bonding 






(5.61 db 0.49) X 10^ 


1.15 X 10^ 


ned bonding 






2 


(4.99 ± 0.39j X 10 


1.51 X 10 


-2 


(3.68 ± 0.38j X 10 


l.z4 X 10 


4 


(3.94 ± 0.44) X 10 


8.32 X 10 


-4 


(3.96 ± 0.41) X 10^ 


1.35 X 10^ 


6 


(3.58 =b 0.45) X 10^ 


6.41 X 10^ 


-6 


(3.17 ±0.40) X 10^ 


8.64 X 10^ 


8 


(3.02 ±0.32) X 10^ 




-8 


(2.68 ±0.35) X 10^ 




10 


(3.27 ±0.46) X 10^ 




-10 


(2.34 ±0.23) X 10^ 




knot bonding 






-4,8 


173 




-6,6 


801 




-6,8 


308 


r\j 


-8,4 


14.6 




-8,6 


7.29 





TABLE XV. Total flux from unbound to bound states of various 
alignments for a repetitive sequence. We interpret these fluxes as 
rates of instantaneous reactions. Pseudoknots rarely form before 
any complete register: due to this rarity, relative errors for the 
pseudoknotted results are approximately 100%. 



tions. It is not this difference in enthalpy itself, however, 
that explains our results. In the FFS simulations of disso- 
ciation of register —10 at 300 K, for simulations launched 
from the penultimate interface prior to separation (A^) 
full dissociation was observed in 2237 trajectories and 
reformation of the full duplex in 5173, despite an aver- 
age interstrand enthalpy of only — 1.28 kcal mol~^ at this 
interface. Note that many trajectories launched from 
reached alternative met ast able states, rather than disso- 
ciating - in the figures given above we only consider tra- 
jectories launched from configurations at A^ in which no 
other register of bonding is present, explaining the differ- 
ence between the numbers quoted and those in Table [X] 
Rather, the weaker interactions and greater distance be- 
tween hydrogen-bonding sites in the kinetic ensemble in- 
dicate that the geometry of the two strands is not gener- 
ally conducive to full hybridization, and not refiective of 
states with comparable enthalpy in the bound ensemble. 
As a result, there is a reasonable probability of strands 
dissociating even after making initial contacts with signif- 
icant interstrand interactions, and therefore the process 
of duplex formation has a non-negligible negative activa- 
tion enthalpy. 

We have also noted a competing contribution to the ac- 
tivation enthalpy by comparing the intrastrand interac- 
tions in the equilibrated single-stranded state with those 
in the ensemble of states from the penultimate FFS inter- 
face. For the unbound ensemble, we obtain an average of 
— 133kcalmol~^, compared to — 129kcalmol~^ from the 
states at the penultimate interface of FFS simulations. 



In the absence of intrastrand base-pairing, this difference 
is attributable to less effective stacking of the individ- 
ual strands in the hybridization ensemble. Disrupting 
stacking makes it easier for the strands to be in contact 
without being fully bound, and also optimal stacking con- 
figurations are not consistent with duplex geometry.!^ 
The difference in intrastrand enthalpies contributes to 
the overall activation enthalpy of binding, tending to 
make it less negative. However, for our model, the ef- 
fect of disrupting stacking is smaller contribution than 
other effects that favour a negative activation enthalpy. 



3. Hybridization of repetitive sequences 

The results of the FFS simulations of the initial hy- 
bridization of repetitive sequences at 300 K are given in 



Table XV The probability of formation of each register 
ri is approximately proportional the number of bonds 
available, 14 — |ri|. The results of FFS simulations of re- 
arrangement and dissociation at 300 K are summarised in 



Table XVI Equivalent results for simulations at 340.9 K 



are given in Tables XVII and XVIII 

As is evident, initial misaligned structures with more 
than four base pairs tend to rearrange into structures 
with a greater degree of base-pairing at 300 K. Regis- 
ters ri = ±10, with only four base pairs, have a simi- 
lar probability of forming a more strongly-bound duplex 
and dissociating. We were unable to reliably simulate 
the relaxation of the relatively stable pseudoknot struc- 
tures {ri,r2} = {-4,8}, {-6,6}, {-6,8}, {-8,4} and 
{—8,6}. However, given that each register present in 
these metastable pseudoknots can form at least six base 
pairs, it seems likely that these structures would eventu- 
ally relax to the fully-formed duplex. We emphasize that 
unlike the six pseudoknots listed above, most pseidoknots 
relax to a single register reasonably quickly. 

To estimate the rate of formation of the fully-formed 
duplex, we therefore sum over the rate of formation of 
all structures from the initial simulations, with the ex- 
ception of registers ri = ±10. In these cases we take the 
fraction of structures that rearrange into another struc- 
ture with a higher degree of base-pairing as the fraction 
that eventually form a full duplex. The result thus ob- 
tained is kon = 3.8 X 10^ s~^, approximately five times 
larger than the result for non-repetitive sequences given 



in Table XIII As justified in Appendix |B 4[ this analysis 
ignores the time spent in the metastable intermediates. 

At 340.9 K, dissociation is a non-negligible pathway 
even for the most stable misbonds, ri = ±2. To an- 
alyze this case, we considered only the two most likely 
routes out of each metastable state, those highlighted in 
green and yellow in Table pTVIIII Calculation of the over- 
all transition rate into the ri = state is then a relatively 
simple problem, yielding /con = 4.0 x 10^ s~^, almost 10 
times smaller than for the same sequences at 300 K (this 
calculation assumes the negative registers behave identi- 
cally). 
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result 


2 


-2 


4 


-4 


Initial 
6 


register 
-6 


8 


-8 


10 




-10 





1.85 X 10^ 


2.68 X 10 


1.55 X 10^ 


298 


128 


41.4 


1.04 X 10^ 


5.04 X 10^ 


4.74 X 


10^ 




5.30 X 10^ 


2 






4.95 X 10 




867 




1.15 X 10^ 


9.51 X 10^ 




2.59 X 


10^ 




3.05 X 10^ 


-2 


34.0 


(-^^ 




A.bA X 10 


30.9 


957 


1.01 X 10^ 


3.95 X 10^ 


3.33 X 


10^ 




3.23 X 10^ 


4 


299 








8.23 X 10 


1.43 X 10 


2.85 X 10^ 


2.82 X 10^ 


2.88 X 


10^ 


4.81 X 10^ 


-4 




2.16 






2.96 X 10 


8.89 X 10 


1.91 X 10^ 


4.42 X 10^ 


2.81 X 


10^ 


3.60 X 10^ 


6 






45.9 






5.46 


1.31 X 10^ 


1.38 X 10^ 


1.55 X 


10^ 


h.76 X 10^ 


-6 








212 




r-J 




1.55 X 10^ 




2.69 X 


10^ 


2.60 X lO'^ 


8 










93.7 








^.49 X 


10^ 


1.00 X 10^ 


-8 












586 






8.10 X 


10^ 


6.54 X 10^1 


10 
-10 














409 


183 
211 


9.98 X 


10^ 


1.99 X 10^ 


-4,8 














9.91 X 10^ 




2.40 X 


10^ 


1.87 X 10^ 


-6,6 










1.72 X 10^ 


1.96 X 10^ 


462 


1.60 X 10^ 






5.92 X 10^ 


-6,8 












1.28 X 10 


1.52 X 10^ 




2.89 X 


10^ 


2.85 X 10^ 


-8,4 










11.7 


84.3 




^5.58 X 10^ 


1.21 X 


10^ 


1.52 X 10^ 


-8,6 










1.64 X 10^ 






1.80 X 10^ 






1.04 X 10^ 


-8,8 














9.84 X 10^ 




8.81 X 10^ 


1.42 X 


10^ 


7.65 X 10^ 


melt 














8.92 X 10^ 


9.04 X 10^ 


1.47 X 


10^ 




1.71 X 10^ 



TABLE XVI. Total flux from misaligned states to other (meta)stable states for the repetitive sequence at 300 K. Blank spaces indicates 
events that could potentially have occurred during simulations, but were not observed. indicates transitions that were not sampled. 
The most common transition for each initial state is highlighted in yellow, other reasonably frequent results are highlighted in green. 
Standard errors on the estimates for internal displacement are on the order of 15% for the most common results, rising to 100% for less 
frequently observed traditions. Standard errors on the estimates of melting are in the range 10 - 25%. 



Register 


Flux / s-^ 


Correct bonding 







(1.15 ±0.19) X 10^ 


misaligned bonding 




2 


(1.51=1=0.35) X 10^ 


-2 


(1.24 ±0.24) X 10^ 


4 


(8.32 ± 1.4) X 10^ 


-4 


(1.35 ±0.33) X 10^ 


6 


(6.41 ±2.8) X 10^ 


-6 


(8.64 ± 1.5) X 10^ 



TABLE XVII. Total flux from unbound to various misaligned 
states for the repetitive sequence at 340.9 K. We interpret these 
fluxes as rates of instantaneous reactions. 



4. Sequence-dependence of association rate 

The relative association rates of eight base-pair du- 
plexes with varying sequence, obtained using the proto- 
cols outlined in Appendix |C 6| are presented in Table 
|XIX[ Also shown are the rates of displacement fitted by 
Zhang and Winfre^ for G-C-rich, A-T-rich and average- 
strength toeholds, in the limit of long toehold lengths (we 
take our eight-base sequences from this source). These 
rates are assumed to reflect the association rates of the 
toeholds themselves.^ 






1.66 X 10^ 


2 




-2 


2.39 X 10^ 


4 


4.22 X 10^ 


-4 


763 


6 




-6 


84.6 


melt 


T39 X 10^ 



2.37 X 10^ 



4.54 X 10^ 
4.54 X 10^ 

179 
6.24 X 10^ 
1.42 X 10^ 

4.40 X 10^ 



1.37 X 10^ 

1.27 X 10^ 

1.21 X lO'^ 

7.73 X 10^ 

1.27 X 10^ 

5.03 X 10^ 

3.64 X 10^ 



TABLE XVIII. Total flux from misaligned states to other 
(meta)stable states for the repetitive sequence at 340.9 K. Blank 
spaces indicates events that could potentially have occurred during 
simulations, but were not observed. indicates transitions that 
were not sampled. The most common transition for each initial 
state is highlighted in yellow, other reasonably frequent results are 
highlighted in green. Standard errors on the estimates for internal 
displacement are on the order of 10% for the most common results, 
rising to 100% for less frequently observed traditions. Standard 
errors on the estimates of melting are approximately 5%. 



Appendix E: Detailed comparison of oxDNA with 3SPN.1 

Here we discuss the differences between our results and 
those for 3SPN.1, an alternative model of DNA. This 
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Sequence Type Relative binding rate Bimolecular rate constants 

from Ref.^/ M'^s"^ 

misaligned bonds allowed 
5'-CCCGCCGC-3' G-C-rich 1 6 x 10^ 

5'-TCTCCATG-3' average-strength 0.28 ± 0.05 3 x 10^ 

5'-ATTTATTA-3' A-T-rich 0.14 ± 0.02 4 x 10^ 

misaligned bonds not allowed 
5^-CCCGCCGC-3^ G-C-rich 0.32 =b 0.06 

5'-ATTTATTA-3' A-T-rich 0.10 =b 0.02 



TABLE XIX. Binding rates of 8-base strands of different sequences at 298.15 K, relative to the G-C-rich case. Reaction rates 
were measured for the strands shown and their complements. Reaction constants are taken from the long-toehold limit of the 
fits in Ref.[9] 



discussion is needed because 3 SPN.l has also been used 
to study hybridisationp^'^^ElHini finding some similar re- 
sults (such as transitions being complex) but, impor- 
tantly, finding significantly different pathways towards 
hybridization. There are several major differences be- 
tween oxDNA and 3SPN.1 that are relevant to this anal- 
ysis. 

• Single-stranded DNA in 3SPN.1 consists of un- 
physically stiff helices,!^ whereas single strands in 
oxDNA can unstack and hence are more flexible, 
with a greater degree of conformational freedom. 
The importance of treating the extra flexibility of 
ssDNA relative to duplexes is evident in the forma- 
tion of single-stranded hairpin an d in the force- 
extension properties of ssDNA,'^^^^ both o f which 
are accurately reproduced by oxDNA.'^^'^ 

• The base-pairing interaction in oxDNA is strongly 
modulated by orientation of the nucleotides^^, 
meaning that the edges of bases must point at each 
other to form bonds. This reflects the strongly 
directional nature of hydrogen bonding. 3SPN.1 
has several beads for each nucleotide, but all inter- 
actions between beads are isotropic. Thus bond- 
ing can occur in configurations in which the bases 
are close to each other, but not in a realistic 
orientation for hydrogen-bonding. As discussed 
by Florescu and Joyeud^, these isotropic inter- 
actions can even lead to unphysical stable states 
for poly(dA)-poly(dT) in which each nucleotide is 
bound to two others (although Florescu and Joyeux 
studied an earlier version of the model, 3SPN.O,^^ 
the hydrogen-bonding geometry is unchanged in 
3SPN.1). 

• 3SPN.1 contains an attractive interaction between 
sugar sites that was introduced to mediate the hy- 
bridization reaction. This attraction provides a 
stabilizing contribution to the system when the sin- 
gle strands are in close proximity to each other, but 
not bound with hydrogen bonds. Sambriski et al^ 
justified this term by referring to the tendency of 
DNA duplexes to condense in the presence of mul- 
tivalent ions, but its role in a model parameterized 
for monovalent ions is unclear. 



Next we discuss how these differences play out for the 
dynamics of hybridisation. Perhaps the most important 
geometric difference between the two models is the fact 
that the single strands in 3SPN.1 are stiff and helical. 
We show that zippering in oxDNA occurs because the 
single-strands are relatively flexible: double helices form 
in stages as bases stack onto the end of the growing du- 
plex. The stiffness of the duplex itself is an emergent 
property, rather than being imprinted at the level of the 
single strands. By contrast, in 3SPN.1, hybridisation oc- 
curs through the association of two fairly stiff helices, for 
which the most natural pathway is probably the winding 
referred to in the detailed study by Schmitt and Knotts.'^ 

Another way the flexibility of the strands plays an 
important role involves the mechanism of internal rear- 
rangement. The intermediates of internal displacement, 
involving bulged or pseudoknotted states, require signif- 
icant flexibility in the single strands, and hence these 
processes will be suppressed by the stiff single strands in 
3SPN.1. 

Instead of using internal displacement, repetitive 
strands in 3SPN.1 can slither past each othei^^ in a 
mechanism 'devoid of significant energy barriers'. This 
ability to 'slither' suggests that a similar sliding mech- 
anism may also explain how initially misaligned non- 
repetitive duplexes relax to the native state.SSESl Slith- 
ering is not observed in oxDNA. In order to undergo 
slithering, the strands must slide relative to each other 
along the duplex axis. Performing such an operation with 
oxDNA would be extremely costly: the system would 
have to move through an intermediate state in which all 
base pairs were broken but the strands were still held in 
a double helical orientation, wrapped around each other. 
Thus reaching the intermediate state involves an enor- 
mous enthalpic cost, with little entropic gain to com- 
pensate. By contrast, in 3SPN.1 this process, is 'devoid 
of significant energy barriers' for repetitive sequences 
with a repeat unit of two bases. Several factors con- 
tribute to this difference. Firstly, the isotropic nature 
of interactions means that hydrogen-bonding need not 
be fully disrupted during slithering. Secondly, the at- 
traction between sugar sites stabilizes a state in which 
the two strands are wrapped round each other, but not 
base-paired. Finally, the fact that 3SPN.1 helices are 
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so stiff means that the conformational freedom of single 
strands is significantly reduced. Therefore the fact that 
they must remain helical during the slithering process in- 
curs a relatively smaller entropic penalty than in oxDNA, 
meaning that it is a viable alternative to dissociating. We 
note that, although internal displacement via inchworm 
and pseudoknot intermediates can occur in oxDNA, both 
processes nevertheless involve significant free-energy bar- 
riers associated with initiating the displacement. 

Any coarse-grained DNA model makes compromises 
between accuracy and tractability. In fact such mod- 
els will never simultaneously reproduce all the properties 
of DNA, a general attribute of effective coarse-grained 
systems sometimes called "representability problems".^ 
OxDNA was specifically designed in order to reproduce 
hybridization thermodynamics as well as the mechanical 
properties of both single and double strands. We ar- 
gue here that capturing the strand flexibility as well as 
the orientational dependence of the effective potentials 
is crucial if one wants to reproducing the gross features 
of the hybridization kinetics we focus on in this paper. 



Our success at quantitatively reproducing relative rates 
measured for strand displacement systems^ gives us con- 
fidence in our predictions of similar physical phenomena 
in hybridisation. 

3SPN.1 has some advantages over oxDNA. For exam- 
ple, 3SPN.1 explicitly represents the asymmetric grooves 
in DNA, allowing structural properties that are sensitive 
to this feature to be modelled. Electrostatic screening ef- 
fects are also explicitly included, allowing 3SPN.1 to cap- 
ture the effects of changing salt concentrations, whereas 
oxDNA is limited to one salt concentration. Encourag- 
ingly, both models show that hybridisation can proceed 
through complex pathways. Nevertheless, we conclude 
that oxDNA's representation of hybridization, involving 
nucleation and zippering of flexible strands to form stiff 
helices and the possibility of internal displacement, is 
more likely to represent true features of real DNA. Of 
course at the end of the day, the true arbiter of all these 
predictions will be experiment, and it is likely that slith- 
ering and internal displacement will give distinguishable 
predictions as features such as the repeat length of a 
repetitive sequence are changed. 



